From the course: Python-Powered Excel: Boost Your Excel Productivity with the Power of Python by Microsoft Press

Learn how to reference cells in Python

All right. So let's get started here, and the first thing that we're going to do is we're going to talk about how to reference cells in an Excel spreadsheet in Python. Now this will really kind of form the foundation for everything else because obviously in order to do any kind of work whatsoever with data that you have in an Excel spreadsheet, you need to be able to refer to specific cells and actually access those values inside Python. Before that, though, we're going to need to set up some sort of testing environment so that you can actually work with these types of commands and libraries that we're going to be taking a look at. So what I'm going to recommend you do, I'm going to actually be using a Jupyter notebook to demonstrate all of this. There are many other ways that you can do this, and I can even show you some more ways after we're done here if you want. But a Jupyter notebook is just going to make it a lot easier to actually write and run our code and see the results quickly, right? You can always run this in a local Python script or something like that if you want to, but that's a little bit more work, and it's a little bit more cumbersome for many people to actually work with. So if you want to use a Jupyter notebook, the easiest way to get started with this is to go to Anaconda.cloud and create an account. And once you've done that, you should be able to just click on this Jupyter notebook's link here. If it doesn't look quite the same as what you see here, just look for something that lets you launch a notebook and that will bring up a screen that looks like this. All right. Jupyter notebooks are basically just interactive interfaces that you can use for writing and running Python code. They're heavily used in things like data science and data analysis. So that was another part of my logic for why I wanted to use Jupyter notebooks. Instead of just writing and running Python scripts. These are already very commonly used in the field. So once you've got that set up, the next thing that we're going to do is let's talk about kind of the basic process for referencing cells in a Python spreadsheet. So first of all, in order to access the data that's inside an Excel spreadsheet more easily, we'll usually use something called a library, right? Basically, just code that someone else has written that allows us to do exactly what we're looking to do with minimal code writing on our part, right? Basically, someone else has already figured out the easiest way to access and work with Excel spreadsheets, so we're just going to use the tool that they created. Now, there are several libraries out there for this. Openpyxl is probably the most commonly used one, and that's the one that we're going to be using here. But there are others, such as pandas, that will allow you to load data from an Excel spreadsheet as well. So both of these ultimately just make working with data that's in an Excel spreadsheet a lot easier and allow us to get started almost immediately, as you'll see. All right. So the first thing to know here is how to get set up, how to actually load an Excel spreadsheet into your Python program, and then access an individual cell. Now this bullet point here shows that we can reference a cell using sort of this syntax, similar syntax to what you might use to access elements inside a dictionary in Python. So I'll show you what that looks like here. We're going to go back over to our Jupyter notebook, and I guess first of all, we need an Excel spreadsheet that we can actually work with. Now, the first thing that we're going to need here, of course, is an Excel spreadsheet that we can actually use these functions on. So what I'm going to recommend you do is go to data.gov, which is basically just a very, very large collection of datasets. You can see that currently there are over 300,000 datasets available. If you just click search without typing anything in there, you can see what the most popular ones are. And then you just need to look for datasets that have this little XLS label. And actually, there was one that I found recently, that is the fruit and vegetable prices. If you click on this and then go to here, if you go to download, what that will actually do is that will take you to the webpage that has links to a series of Excel files for all sorts of different fruits and vegetables, right? So you can see that there's individual Excel files for fruit prices, things like that. So let's just pick apples, right? I'm going to click download XLSX, an Excel file, there. Bring that over into our Jupyter notebook here, drag that into your files here. Okay, and you should see that you have a nice little Excel spreadsheet in there. Now, if you want to actually view this, you're actually going to need to do something like load it into Microsoft Excel. So just to show you what this is going to look like, there we go. Here is what that Excel spreadsheet looks like. All right. So there are a fair amount of data points that we might want to access inside a Python program, all of these prices here. So what I'm going to do is I'm just going to start off by loading this Excel spreadsheet into our Jupyter notebook, and this would be the same kind of process for if you were writing a Python script. So if you prefer to do that instead, go ahead and do that. But here's what this is going to look like. We're going to start off by using openpyxl. So here's what this will look like. We'll save from openpyxl import load workbook. Oops, that should be a lowercase there. And then we're going to say wb equals load_workbook, and here's where you're going to have the name of that file. Now this is a pretty, well, maybe it's not the worst file name out there, it's just Apples 2022. So we'll try that, Apples-2022.xlsx. And now that we have that, right, a workbook is essentially the entire thing that you see here, including all of the sheets that the Excel document might contain. Now, in this case, it only has one sheet, so it's not really that big of a deal. But if you were to have more than one sheet that you wanted to be able to work with, each of those is referred to as a worksheet. So you just need to know what the sheet name is that you want to work with specifically, right? The problem here is as you'll see that when you have multiple sheets, there's multiple A1 cells, right? So, apples, this first sheet here has cell A1, but so does this next sheet, and Python needs to know, obviously, which one you're referring to in the Excel documents. So if we go back over to here, what we're going to see is we need to actually choose a sheet by saying ws equals, and then we're going to take that workbook that we just loaded and use the name of the sheet we want. So that's going to be the apples sheet in this case, right? Obviously, the exact name is going to depend on what sort of Excel document you're working with, but now that we have the worksheet, we can actually start accessing the values in some of those cells. So here's what this is going to look like. So, as I said back in the slides here, we can reference a cell just by using its simple letter and then number coordinates in square brackets after the sheet that we've selected. So let's try and select some prices here. Let's go back to our apples and just take a look at this. We can see that the average retail price, if we expand that so we can read it, the average retail price for let's say fresh apples, is this cell B3. So let's try and access that. What we're going to see here is if we say, we'll just print out this value, we'll say ws, and then in a string here, B3. And if we run this thing, which you can do by pressing Shift and Enter, right? What we'll see is that will actually give us a reference to the cell. Now, what happens if we want to access the cell's value? Well, all we have to do in that case, as you see in the slide here, is put .value after the cell, right? So cells, as you'll see in much more detail shortly, have a lot of other sort of functionality that we might want to access besides just seeing the value inside it. So that's why the initial thing that you select is a cell object, and in order to get the value, you have to sort of add a little bit of extra code there with .value. So let's just try that now. We'll say ws['B3'].value, and sure enough, we see that there is a value inside there. Now, notice here that this is not exactly what we saw in the cell. That's one key thing to understand is that in Excel, what you see here isn't necessarily the exact value that's being stored behind the scenes because obviously if we were to, we can do things like round numbers in a column, things like that, depending on how precise we need them to be. And if we do that in order to avoid losing data, right? If we were to round these to the nearest dollar, let's say, this would be displayed as $2, but behind the scenes, Excel is still going to store that as whatever the original value was so that if we wanted to revert it back to rounding to the nearest scent as we're seeing here, it would be able to access that original value, okay? So that's the basic idea there, and notice that this is also loaded for us as a float automatically. That's a pretty nice feature of working with Excel, right? You can see that this is in fact a float, is that unlike with a lot of other reading text files or dealing with user input in Python where the initial value is actually a string that you then have to manually convert to another type, with this openpyxl library, it will usually do a pretty good job of giving us the exact type that we want, that we think it should be in the Excel spreadsheet. So that's the basics of referencing cells. Let's take a look at the last bullet point here. We can actually modify cells as well by using this openpyxl library. Now, this is something you do want to be careful of doing, right? You want to make sure that what you're working with actually is a copy of the original Excel spreadsheet. So something that you've downloaded from the cloud, perhaps. Especially while you're getting used to the basics of working with Excel spreadsheets and Python, I would highly recommend that you don't work on the original spreadsheet using Python programs, right? There's just too many things that can go wrong where you'll unexpectedly, accidentally remove a lot of data from your sheet or something like that. So you can modify a cell in Python in pretty much the same way as you can modify, let's say, an entry and a dictionary in Python, and that is just by referring to the exact cell and saying equals along with whatever you want that new value to be. Now, notice here that we don't need to actually say ws['B3'].value equals, we can just say ws['B3'] equals. So let's try this and let's maybe change this value to, let's say that the price of fresh apples goes way, way up. Let's set this to $100. That would be a rather frightening scenario there, but let's just give it a try. We're going to set this to 100, and if we hit Enter here, what we'll see, it's not going to print anything out here, but that will actually modify the Excel spreadsheet in memory only. So one thing to notice here is that if we take a look at our spreadsheet again, if we take a look at this thing. Oops, well, we can't take a look at it here. But if you were to look at that right now, you would see that the B3 cell there would still have the original value because by default, nothing's going to happen, right? Python does this in order to protect us here from modifying our Excel spreadsheet accidentally. If we want to actually save that, we need to actually say wb.save, right? That will save any changes that we've made like this one, and we can actually create a new file here. So this is another way that you can kind of protect yourself from unexpected changes. We can say something like Apples-20, maybe this is in a doomsday scenario, right? 2,100 apples are $100 a pound, xlsx. I guess it could just be from inflation, too, but we're just going to run that, and sure enough, what we'll see is that that will create a new Excel file, and if we open that one up, I'm just going to open that up in Excel here. Just give me a second to do that. All right, there we go. So if you take a look at this, notice that that has in fact been changed and it's even displayed still with the correct format, right? It's still displayed with that dollar format. So that's really the nice part about this openpyxl library is that it really makes a lot of these operations really easy and intuitive. There's not a whole lot of extra work you have to do, generally speaking, in order to actually get these things to work. All right, and that's pretty much all I wanted to show you here. The last thing being that if you're interested in looping through cells in either a row or a column, right? So let's say that we wanted to get all of the prices along here, right? So something like that, you can actually do that in openpyxl by saying, and I'm actually going to go and in a new cell, what we're going to do is we're going to say for, and we'll just start with rows here, I suppose. We'll say for row in ws.iter_rows, okay? So this is just a method that you can call on the worksheet object that you have here that will basically allow you to loop through rows in Python, like you would be able to loop through, say, a list or a dictionary, right? So here's what this is going to look like. First of all, we need to specify the minimum row, right, so the row that we want to start at. So let's say that we want to start at row 3, for example, and go to row 7. What that's going to look like is we're going to say, min_row equals 3, and then max_row equals, and we'll put 7 there, okay? And then we actually get to start with the columns as well. So let's say that we want just these three columns here, right? So column A through column C. Well, what you can say there is you can say min_column, and actually, you do need to use letters or you need to use numbers rather for this. So we can say min_column equals 1, and then max_column equals 3, okay? So that's going to loop through that for us. And basically this, right, this for loop is going to give us each and every row, but inside each row you're going to have multiple cells, right? So basically, what we're usually going to need to do in this case is say for cell in row, right? That basically goes through all of the cells that whatever row we're currently looking at contains. And we can just do something like, we'll say print, and you can actually get the coordinate of a cell. So if you have a reference to a cell like we have here and you don't actually know which one it is, you can actually say, cell.coordinate, and that will give you the coordinates. So A1, B2, C5, whatever, and we'll say cell.value. So we'll just print both of those things out. And if we run this, sure enough, we see that gives us A3, fresh one. B3, that gives us the price there. C3, per pound. A4, applesauce2. For some reason, there's other numbers that are working in there. Oh, that's because those are like footnotes for the bottom. So that's just something that's sort of getting attached on to it. So that's how you can loop through rows and cells in a row. You can actually do the same thing in the other direction, where you loop through columns first and then get all of the cells in each column. You can do that just by using iter_columns. That's just iter_cols, and you get the same; you can put the same keyword args in there. So if we run that, we see that it gives us the same results just in a different order, right? We see fresh one, applesauce2, and juice, and ready to drink and frozen first, right? You can see that it's going through column A first, then it goes to column B, then it goes to column C. Whereas with iter_rows, there we go. You can see it does row 3, then row 4, then 5, then 6th, then 7. So that's the basics of working with individual cells in an Excel spreadsheet using openpyxl. The last thing that I wanted to mention is if you want to use openpyxl in a Python script, you're going to have to install it by saying pip install openpyxl. So if you're running it locally, right? What I'm using here is the Anaconda distribution, meaning that it comes prepackaged with a lot of these most commonly used libraries and things like that. So I don't have to install that here. If you use Anaconda.cloud like I'm using here, you won't need to install it either. But if you're doing it locally or in another environment, you're going to have to install that yourself. So I just wanted to make you aware of that, and that's about it for working with individual cells in Excel with Python.

Contents