From the course: Data Analysis with Python and Pandas

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Solution: Final project

Solution: Final project

Let's go ahead and dive into the notebook. All right, so let's go ahead and import Pandas and NumPy, and then we're going to read in our project_transactions.csv. And before we do column selection, let's just go ahead and read in the file as is, and then take a look at describe and info. So if we read in the entire file, this is 180 megabyte file. but certainly eligible for Int16, which has a maximum value of 32,000. And then if you take a look at our PRODUCT_ID and QUANTITY, these are going to be ineligible for Int16, that 32,000 limit. But these will be well within range of our Int32 data types. So we're going to go ahead and convert DAY to Int16, QUANTITY to Int32, and PRODUCT_ID to Int32. And when we do this, we now have a DataFrame that is going to take up 75.7 megabytes, so a pretty sizable reduction down from our original 180. And even when we're just talking about our reduced DataFrame, we're still able to get about 20 plus percent memory reduction, not too shabby. And then…

Contents