From the course: pandas Essential Training
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Working with duplicates
From the course: pandas Essential Training
Working with duplicates
- [Instructor] Have you ever been at a magic show where the magician pulls the same rabbit out of the hat multiple times? Well, having duplicate data in your pandas DataFrame is kind of like that, except it's not nearly as funny or entertaining. So let's take a look at our DataFrame nw, and you can see that we've got 10 columns and 2048 rows. Now, if we look at the duplicated method, and we sum them. Actually, before we do that, let's just take a look at what the duplicated method is all about. And you can see that the duplicated method returns a boolean series denoting the duplicate rows. And so, you can see that we've got six duplicates here. And so if I want to go ahead and display them, these are the six rows that are duplicates. You can see we've got identical entries for every single row. So we've got 2048 rows and 10 columns. So if we go ahead and drop the duplicates, we should find that we're now at 2042 rows, which is exactly what we have here. Now, let's say I want to use…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
(Locked)
Working with data types (dtype)5m 8s
-
(Locked)
Memory usage of dtypes4m 36s
-
(Locked)
Defining dtypes when you read in a file3m 36s
-
(Locked)
Python functions4m 50s
-
(Locked)
Working with indexes6m 15s
-
(Locked)
Being productive in pandas: My best practices9m 20s
-
(Locked)
Creating Series and DataFrames2m 12s
-
(Locked)
Working with dates4m 1s
-
(Locked)
Combining DataFrames6m
-
(Locked)
Combining datasets5m 8s
-
(Locked)
Working with missing data5m 42s
-
(Locked)
Removing missing data4m 17s
-
(Locked)
Working with duplicates3m 10s
-
(Locked)
Validating data7m 9s
-
(Locked)
Updating the dtypes4m 47s
-
(Locked)
Combine the datasets2m 16s
-
(Locked)
-
-
-