1

I'm trying to filter and sort a Pandas dataframe to clean my data. I've looked on StackOverflow and can't seem to find a method that will give me the sort and filter I need. The data I'm working with looks something like this:

| Name 1 | Name 2 | Score |
| ------ | ------ | ----- | 
| Amy | Jack | 2.456 | 
| Amy | Jack | 3.234 | 
| Amy | Jack | 5.124 | 
| ... | ... | ... | 
| Max | Jane | 8.569 |
| Max | Jane | 4.654 |
| Max | Jane | 6.349 |

What I want to do make a new dataframe out of the lowest score of every pair of names. So the resulting dataframe would be something like this:

| Name 1 | Name 2 | Score |
| ------ | ------ | ----- | 
| Amy | Jack | 2.456 | 
| ... | ... | ...|
| Max | Jane | 4.654 | 
1
  • 2
    if you have more columns df.loc[df.groupby(['Name 1', 'Name 2'])['Score'].idxmin()] Commented Apr 15, 2021 at 17:53

2 Answers 2

3

Use:

df = df.groupby(['Name 1', 'Name 2'], as_index = False).agg(Score = ('Score', 'min'))

Output:

>>> df
  Name1 Name2  Score
0   Amy  Jack  2.456
1   Max  Jane  4.654
Sign up to request clarification or add additional context in comments.

Comments

3

You can also use sort_values() and groupby() method:

df.sort_values(by='Score').groupby(['Name 1', 'Name 2'], as_index = False).first()

OR

Use sort_values() and drop_duplicates() method:

df.sort_values(by='Score').drop_duplicates(subset=['Name 1', 'Name 2'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.