5

I have a dataframe df like this :

ID    NAME    AGE
-----------------
M43   ab      32
M32   df      12
M54   gh      34
M43   ab      98
M43   ab      36
M43   cd      32
M32   cd      39
M43   ab      67

I need to sort the rows based on the ID column.
The output df_grouped should look like :

ID    NAME    AGE
-----------------
M43   ab      32
M43   ab      98
M43   ab      36
M43   cd      32
M43   ab      67
M32   df      12
M32   cd      39
M54   gh      34

I tried something like :

df_grouped = df.group_by(df.ID)

for id in list(df.ID.unique()):
   grouped_df_list.append(df_grouped.get_group(id))

Is there any better way to do this ?

10
  • 3
    That doesn't look like grouping - more like sorting... isn't df.sort_values('ID') what you're after? Commented Feb 6, 2018 at 17:34
  • Unfortunately my example looks like that, the ID column has - say 6 unique entries, I need to group rows in these six chunks. Commented Feb 6, 2018 at 17:36
  • Add more data and show an sample output with grouping of six, please. Commented Feb 6, 2018 at 17:39
  • 2
    You want to have rows with identical IDs adjacent to each other but retain the order they originally appeared in the frame right? If so - your code example makes more sense - just a fairly poor choice of sample data and lack of explanation :) Commented Feb 6, 2018 at 17:51
  • 1
    @deadbug you can just use sort_values on ID. Try it. Commented Feb 6, 2018 at 17:56

2 Answers 2

7

You can sort by multiple columns using pd.DataFrame.sort_values:

df = df.sort_values(['ID', 'NAME'])

By default, the argument ascending is set to True.

Sign up to request clarification or add additional context in comments.

Comments

1

You can use pd.factorize to turn the key into a unique number which represents the order it appeared, then argsort that to get the positions to index into your frame, eg:

Given:

     0   1   2
0  M43  ab  32
1  M32  df  12
2  M54  gh  34
3  M43  ab  98
4  M43  ab  36
5  M43  cd  32
6  M32  cd  39
7  M43  ab  67

Then:

new_df = df.loc[pd.factorize(df[0])[0].argsort()]
# might want to consider df.reindex() instead depending...

You get:

     0   1   2
0  M43  ab  32
3  M43  ab  98
4  M43  ab  36
5  M43  cd  32
7  M43  ab  67
1  M32  df  12
6  M32  cd  39
2  M54  gh  34

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.