From the course: Data Analysis with Python and Pandas
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
Identifying and dropping duplicates
From the course: Data Analysis with Python and Pandas
Identifying and dropping duplicates
- [Instructor] Okay, so we just took a look at dropping rows and columns. Let's take a look at a specific case of rows that we might want to drop from our DataFrame. Occasionally when we read in data, we might find that there are a lot of duplicates in our data. Other times we might accidentally create duplicate rows by dropping a key ID column or by performing a bad join. Either way, sometimes duplicates can negatively impact the quality of an analysis, so being able to identify them is going to be helpful. Here we have a product DataFrame. Dairy is repeated three times in the product column. Vegetables and Fruits are unique. In our price column, we have 2.56 for our first two rows of Dairy, so these are exact duplicates of each other. The third instance of Dairy has a different price. One quick way to scan for duplicates would be to compare the number of rows, so, here we have five rows, to the number of unique values in each column. So, the product column has three unique values…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
(Locked)
DataFrame basics4m 20s
-
(Locked)
Creating a DataFrame4m 59s
-
(Locked)
Challenge: DataFrame basics53s
-
(Locked)
Solution: DataFrame basics1m 46s
-
(Locked)
Exploring DataFrames: Heads, tails, and sample3m 35s
-
(Locked)
Exploring DataFrames: Info and describe8m 20s
-
(Locked)
Challenge: Exploring a DataFrame3m 12s
-
(Locked)
Solution: Exploring a DataFrame4m 3s
-
(Locked)
Accessing DataFrame columns4m 53s
-
(Locked)
Accessing DataFrame data with .iloc and .loc6m 6s
-
(Locked)
Challenge: Accessing DataFrame data1m 18s
-
(Locked)
Solution: Accessing DataFrame data3m 23s
-
(Locked)
Dropping columns and rows5m 54s
-
(Locked)
Identifying and dropping duplicates7m
-
(Locked)
Challenge: Dropping data1m 1s
-
(Locked)
Solution: Dropping data2m 38s
-
(Locked)
Missing data3m 17s
-
(Locked)
Challenge: Missing data51s
-
(Locked)
Solution: Missing data2m 13s
-
(Locked)
Filtering DataFrames4m 29s
-
(Locked)
Pro tip: The query() method4m 15s
-
(Locked)
Challenge: Filtering DataFrames1m 29s
-
(Locked)
Solution: Filtering DataFrames6m 46s
-
(Locked)
Sorting DataFrames6m 53s
-
(Locked)
Challenge: Sorting DataFrames44s
-
(Locked)
Solution: Sorting DataFrames2m 45s
-
(Locked)
Renaming and reordering columns3m 10s
-
(Locked)
Challenge: Renaming and reordering columns54s
-
(Locked)
Solution: Renaming and reordering columns3m 18s
-
(Locked)
Arithmetic and Boolean column creation6m 22s
-
(Locked)
Challenge: Arithmetic and Boolean columns1m 40s
-
(Locked)
Solution: Arithmetic and Boolean columns3m 58s
-
(Locked)
Pro tip: Advanced conditional columns with select()5m 59s
-
(Locked)
Challenge: The select() function1m 46s
-
(Locked)
Solution: The select() function3m 34s
-
(Locked)
The map() method4m 24s
-
(Locked)
Pro tip: Multiple column creation with assign()8m 19s
-
(Locked)
Challenge: map() and assign()1m 24s
-
(Locked)
Solution: map() and assign()2m 38s
-
(Locked)
The categorical data type5m 31s
-
(Locked)
Type conversion1m 37s
-
(Locked)
Pro tip: Memory usage and data types6m 2s
-
(Locked)
Pro tip: Downcasting numeric data types4m 58s
-
(Locked)
Challenge: DataFrame data types1m 24s
-
(Locked)
Solution: DataFrame data types3m 19s
-
(Locked)
Key takeaways1m 33s
-
(Locked)
-
-
-
-
-
-
-