custom sorting pandas dataframe

Question

I have a (very large) table using pandas.DataFrame. It contains wordcounts from texts; the index is the wordlist:

             one.txt  third.txt  two.txt
a               1          1        0
i               0          0        1
is              1          1        1
no              0          0        1
not             0          1        0
really          1          0        0
sentence        1          1        1
short           2          0        0
think           0          0        1

I want to sort the wordlist on the frequency of words in all texts. So I can easily create a Series which contains the frequency sum for each word (using the words as index). But how how can I sort on this list?

One easy way would be to add the list to the dataframe as column, sort on it and then delete it. For performance reasons I would like to avoid this.

Two other ways are described here, but the one duplicates the dataframe which is a problem because of its size, and the other creates a new index, but I need the information about the words further down the line.

unutbu · Accepted Answer · 2015-12-06 18:42:27Z

2

You could compute the frequency and use the sort method to find the desired order of the index. Then use df.loc[order.index] to reorder the original DataFrame:

order = df.sum(axis=1).sort(inplace=False)
result = df.loc[order.index]

For example,

import pandas as pd

df = pd.DataFrame({
    'one.txt': [1, 0, 1, 0, 0, 1, 1, 2, 0],
    'third.txt': [1, 0, 1, 0, 1, 0, 1, 0, 0],
    'two.txt': [0, 1, 1, 1, 0, 0, 1, 0, 1]}, 
    index=['a', 'i', 'is', 'no', 'not', 'really', 'sentence', 'short', 'think'])

order = df.sum(axis=1).sort(inplace=False, ascending=False)
print(df.loc[order.index])

yields

          one.txt  third.txt  two.txt
sentence        1          1        1
is              1          1        1
short           2          0        0
a               1          1        0
think           0          0        1
really          1          0        0
not             0          1        0
no              0          0        1
i               0          0        1

edited Dec 6, 2015 at 18:42

answered Oct 5, 2013 at 10:58

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

fotis j Over a year ago

this solution does not work with the current version of pandas (0.16.2). I tested it with the same data with an earlier version, so I gather some recent change in pandas broke it. It will produce a key error.

unutbu Over a year ago

@fotisj: Thanks for the warning. I've modified the answer to work with pandas 0.16.2.

Collectives™ on Stack Overflow

custom sorting pandas dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related