4

EDIT: To put in sample data df and expected output. EDIT 2: I've modified the data slightly so that the results are not uniformly largest number associated with 'cc' in each case.

My problem is:

  • I have a dataframe with two index columns that I am grouping by (Index1, Index2) and three columns (X, Y, Z)
  • I've created a groupby and applied a function to it (to scale all columns within the groupby object to 1)
  • I've summed across the resulting dataframe to get a sum for each row

The df is:

df = pd.DataFrame({'Index1': ['A', 'A', 'A', 'B', 'B', 'B'],
                'Index2': ['aa', 'bb', 'cc', 'aa', 'bb', 'cc'],
                'X': [1, 2, 7, 3, 6, 1],
                'Y': [2, 3, 6, 2, 4, 1],
                'Z': [3, 5, 9, 1, 2, 1]})

Then the code is:

df_scored = pd.DataFrame()   #new df to hold results
cats = [X, Y, Z]             #categories (columns of df) to be scaled
grouped = df.groupby([Index 1, Index 2]).sum()
for cat in cats :
    df_scored[cat] = grouped.groupby(level = 0)[cat].apply(lambda x: x / x.max())
df_scored['Score'] = df_scored.sum(axis = 1)

This produces:

                      X         Y         Z     Score
Index1 Index2                                        
A      aa      0.142857  0.333333  0.333333  0.809524
       bb      0.285714  0.500000  0.555556  1.341270
       cc      1.000000  1.000000  1.000000  3.000000
B      aa      0.500000  0.500000  0.500000  1.500000
       bb      1.000000  1.000000  1.000000  3.000000
       cc      0.166667  0.250000  0.500000  0.916667

Now I want to sort the resulting df_scored by each grouping of Index 1 (so that Index 2 is sorted by 'Score' within each group of Index 1), with this as the desired result:

                      X         Y         Z     Score
Index1 Index2                                        
A      cc      1.000000  1.000000  1.000000  3.000000
       bb      0.285714  0.500000  0.555556  1.341270
       aa      0.142857  0.333333  0.333333  0.809524
B      bb      1.000000  1.000000  1.000000  3.000000
       aa      0.500000  0.500000  0.500000  1.500000
       cc      0.166667  0.250000  0.500000  0.916667

How do I do this?

I've seen a few other questions on this here and here but not getting it to work for me in this case.

3
  • 4
    Can you provide us with some reproducible code / data for testing? Commented Feb 16, 2018 at 2:48
  • 3
    Create some dummy data with multiindex and expected output. That will help us help you. Commented Feb 16, 2018 at 3:00
  • @jp_data_analysis & Scott have tried to create the data / code / hoped for output above. This all works when I run it -- if I can clarify further please let me know Commented Feb 16, 2018 at 4:46

1 Answer 1

6

Add this at the end of your code

df_scored.sort_values('Score', ascending= False).sort_index(level='Index1', sort_remaining=False)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks -- I think that's sorting by the index though, not by the 'Score' column? If I fiddle with the data to change it so cc's results aren't the highest across the board, it's only sorting alphabetically by 'Index2', not numerically by the 'Score' column.
@jbachlombardo Edited the answer. Hope it helps.
Thank you! Works like a charm, much appreciated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.