Sorting pandas DataFrames based on criteria

Question

I have a pandas DataFrame with a structure as follows:

data = DataFrame({'Cat1':['A', 'B', 'B', 'C'], 'Cat2': ['X', 'Y', 'Z', 'X'], 'Counter': [0, 4, 1, 5]})

Now I want to add a separate column with a ranking by Cat1 (so in this case: 1,3,2,4 as new column). My first try was:

data['ranking'] = data['ranking'] + data[data['Cat1'] == 'A']['Counter'].rank(ascending=0).fillna(0)

However, when I add the second Category (data['Cat1']=='B' as condition), it overrides the existing values. This is what I expected, as I have to use .add() as far as I understand. However, the same happens with the following script:

data['ranking'].add(data[data['Cat1']=='A']['Counter'].rank(ascending=0))

Also overrides all values where Cat1==B with NA. How can I avoid this?

Thanks in advance!

-----------------------EDIT!!------------------

Let's say this is my table:

enter image description here

And ordinary rank would give me a ranking of all numbers 1 through 12. Now what I need is a ranking based on the category and as an additional column in the original python DataFrame.

Hence, the last column should look say: 2 (second-ranked value of a) 3 (third-ranked value of a) 1 (first-ranked value of a) 1 (first-ranked value of b) 1 (first-ranked value of c) 5 2 ...

There has to be something I'm missing. data['ranking'] isn't even defined so is there more logic in between your first 2 lines? To do data['ranking']=data['ranking']+... data['ranking'] has to have an initial value from somewhere. — Hoopdady
– Hoopdady, Commented Jan 29, 2013 at 13:33
Hey Hoopdady, yes the data['ranking'] is defined - say zeros. I left this step out as I suppose it doesn't really matter — oliver13
– oliver13, Commented Jan 29, 2013 at 13:37
So is data['ranking'] is a list. Also just noticed. Your key to your dictionary is a boolean. data[data['Cat1']=='a']... is that what you want? — Hoopdady
– Hoopdady, Commented Jan 29, 2013 at 13:39
data['ranking'] is a column of the panda DataFrame 'data' - basically an ndarray as far as I know. The data[data['Cat1']=='a'] filters Cat1 for all values=="a". What I want to do in essence is sort the dataframe based on another column. In Excel you can do it with SumProduct like this:mrexcel.com/forum/excel-questions/… — oliver13
– oliver13, Commented Jan 29, 2013 at 13:51
Ok. I'm sorry it looked like you had a simple python error. I don't know how the Pandas Dataframe works. But from everything I understand about python, your data object has to be more than a simple dictionary like I believed it was from its definition. Best of luck. — Hoopdady
– Hoopdady, Commented Jan 29, 2013 at 14:01

herrfz · Accepted Answer · 2013-01-29 18:51:05Z

2

I'm not sure I understand your question correctly; maybe this one below works?

data['Cat1'][data['Counter'].rank(ascending=0) - 1]

--EDIT--

As in the comment, my solution would be

data['ranking'] = data.groupby('Cat1')['Counter'].rank(ascending=0)

I can't think of anything else, sorry. Maybe others will have a different perspective..

edited Jan 29, 2013 at 18:51

answered Jan 29, 2013 at 15:19

herrfz

4,9044 gold badges29 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

oliver13 Over a year ago

Hey herrfz - the idea is right, but this only creates the rank for all values, not based on one column as a criteria. Say Cat 1 has the values A, B and C and each of them has multiple values. I want them to be ranked separately and in one extra column of the DataFrame

herrfz Over a year ago

What should be the rank values? In your example with two Bs, why do you have 3 and 2 as ranks? if you just do data[data['Cat1']=='B']['Counter'].rank(ascending=0) what you get are 1 and 2...

oliver13 Over a year ago

The rank values should be ascending with respect to counter. I will try to make an edit to give an example of what I am looking for

herrfz Over a year ago

OK, based on the new example, maybe try groupby? data['ranking'] = data.groupby('Cat1')['Counter'].rank(ascending=0)

oliver13 Over a year ago

It's an option - but is there no "better way" without creating this hierarchy?

|

Collectives™ on Stack Overflow

Sorting pandas DataFrames based on criteria

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related