1

Just show my data

In [14]: new_df
Out[14]: 
action_type                           1     2    3
user_id                                           
0000110e00f7c85f550b329dc3d76210   31.0   4.0  0.0
00004931fe12d6f678f67e375b3806e3    8.0   4.0  0.0
0000c2b8660766ed74bafd48599255f0    0.0   2.0  0.0
0000d8d4ea411b05e0392be855fe9756   19.0   0.0  3.0
ffff18540a9567b455bd5645873e56d5    1.0   0.0  0.0
ffff3c8cf716efa3ae6d3ecfedb2270b   58.0   2.0  0.0
ffffa5fe57d2ef322061513bf60362ff    0.0   2.0  0.0
ffffce218e2b4af7729a4737b8702950    1.0   0.0  0.0
ffffd17a96348904fe49216ba3c7006f    1.0   0.0  0.0

[9 rows x 3 columns]

In [15]: new_df.columns
Out[15]: Int64Index([1, 2, 3], dtype='int64', name=u'action_type')

In [16]: new_df.index
Out[16]: 
Index([u'0000110e00f7c85f550b329dc3d76210',
       u'00004931fe12d6f678f67e375b3806e3',
       ...
       u'ffffa5fe57d2ef322061513bf60362ff',
       u'ffffce218e2b4af7729a4737b8702950',
       u'ffffd17a96348904fe49216ba3c7006f'],
      dtype='object', name=u'user_id', length=9)

The output that I want is:

# sort by the action_type value 1

action_type                           1     2    3
user_id
ffff3c8cf716efa3ae6d3ecfedb2270b   58.0   2.0  0.0                                         
0000110e00f7c85f550b329dc3d76210   31.0   4.0  0.0
0000d8d4ea411b05e0392be855fe9756   19.0   0.0  3.0
00004931fe12d6f678f67e375b3806e3    8.0   4.0  0.0
ffff18540a9567b455bd5645873e56d5    1.0   0.0  0.0
ffffce218e2b4af7729a4737b8702950    1.0   0.0  0.0
ffffd17a96348904fe49216ba3c7006f    1.0   0.0  0.0
0000c2b8660766ed74bafd48599255f0    0.0   2.0  0.0
ffffa5fe57d2ef322061513bf60362ff    0.0   2.0  0.0

[9 rows x 3 columns]

# sort by the action_type value 2

action_type                           1     2    3
user_id
00004931fe12d6f678f67e375b3806e3    8.0   4.0  0.0
0000110e00f7c85f550b329dc3d76210   31.0   4.0  0.0
ffff3c8cf716efa3ae6d3ecfedb2270b   58.0   2.0  0.0                                         
0000c2b8660766ed74bafd48599255f0    0.0   2.0  0.0
ffffa5fe57d2ef322061513bf60362ff    0.0   2.0  0.0
0000d8d4ea411b05e0392be855fe9756   19.0   0.0  3.0
ffff18540a9567b455bd5645873e56d5    1.0   0.0  0.0
ffffce218e2b4af7729a4737b8702950    1.0   0.0  0.0
ffffd17a96348904fe49216ba3c7006f    1.0   0.0  0.0

[9 rows x 3 columns]

So, what I want to do is to sort the DataFrame by the action_type, that is 1, 2, 3 or the sum of any of them(action_type sum of 1+2, 1+3, 2+3, 1+2+3)

The output should sorted by the value of action_type(1, 2 or 3) of each user or the sum of action_type(for example the sum of action_type 1 and action_type 2, and any combinations, such as the sum of action_type 1 and action_type 3, the sum of action_type 2 and action_type 3, the sum of action_type 1 and action_type 2 and action_type 3) of each user.

For example:

for user id 0000110e00f7c85f550b329dc3d76210, the value of action_type 1 is 31.0, the value of action_type 2 is 4 and the value of action_type 3 is 3. The sum of action_type 1 and action_type 2 of this user is 31.0 + 4.0 = 35.0

I have tried new_df.sortlevel(), but it seems it has just sored the dataframe by the user_id, not by the action_type(1, 2, 3)

How can I do it, thank you!

10
  • mind posting desired output and explain what you mean by the sum of 1+2, 1+3, 2+3, 1+2+3 Commented May 7, 2016 at 8:32
  • @HugoHonorem, Hi, I have posted my desired output. Commented May 7, 2016 at 8:39
  • what is action_type(1, 2, 3) ? Commented May 7, 2016 at 9:00
  • 1
    still a little bit unclear, what is your expected output for the first 4 rows only? Commented May 7, 2016 at 9:08
  • @HugoHonorem, I wan to sort the dataframe. And the dataframe sorted by the action_type value, there are 3 action_type, 1, 2 and 3. I want to sorted the dataframe by the value of action_type 1 or 2 or 3. Or the sum of the value of action_type 1 and action_type 2, or the sum of the value of action_type 1 and action_type 3, or the sum of the value of action_type 1 and action_type 3, or the sum of the value of action_type 1 and action_type 2 and action_type 3. The user_ids are unique. Commented May 7, 2016 at 9:40

1 Answer 1

2

UPDATE:

If you wanna sort it by columns, just try sort_values

df.sort_values(column_names)

Example:

In [173]: df
Out[173]:
   1  2  3
0  6  3  8
1  0  8  0
2  3  8  0
3  5  2  7
4  1  2  1

sort descending by column 2

In [174]: df.sort_values(by=2, ascending=False)
Out[174]:
   1  2  3
1  0  8  0
2  3  8  0
0  6  3  8
3  5  2  7
4  1  2  1

sort descending by sum of columns 2+3

In [177]: df.assign(sum=df.loc[:,[2,3]].sum(axis=1)).sort_values('sum', ascending=False)
Out[177]:
   1  2  3  sum
0  6  3  8   11
3  5  2  7    9
1  0  8  0    8
2  3  8  0    8
4  1  2  1    3

OLD answer:

if i got you right, you can do it this way:

In [107]: df
Out[107]:
   a  b  c
0  9  1  4
1  0  5  7
2  5  9  8
3  3  9  7
4  1  2  5

In [108]: df.assign(sum=df.sum(axis=1)).sort_values('sum', ascending=True)
Out[108]:
   a  b  c  sum
4  1  2  5    8
1  0  5  7   12
0  9  1  4   14
3  3  9  7   19
2  5  9  8   22
Sign up to request clarification or add additional context in comments.

2 Comments

Sorry for not giving correct output of the data. Your answer is correct! Thanks.
@AlexanderYau, you are very welcome! It was bit difficult to understand what do you want to achieve before you posted an examples of desired output data sets

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.