From the course: Data Analysis with Python and Pandas

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Identifying and dropping duplicates

Identifying and dropping duplicates

- [Instructor] Okay, so we just took a look at dropping rows and columns. Let's take a look at a specific case of rows that we might want to drop from our DataFrame. Occasionally when we read in data, we might find that there are a lot of duplicates in our data. Other times we might accidentally create duplicate rows by dropping a key ID column or by performing a bad join. Either way, sometimes duplicates can negatively impact the quality of an analysis, so being able to identify them is going to be helpful. Here we have a product DataFrame. Dairy is repeated three times in the product column. Vegetables and Fruits are unique. In our price column, we have 2.56 for our first two rows of Dairy, so these are exact duplicates of each other. The third instance of Dairy has a different price. One quick way to scan for duplicates would be to compare the number of rows, so, here we have five rows, to the number of unique values in each column. So, the product column has three unique values…

Contents