0

I have a pandas DataFrame with a structure as follows:

data = DataFrame({'Cat1':['A', 'B', 'B', 'C'], 'Cat2': ['X', 'Y', 'Z', 'X'], 'Counter': [0, 4, 1, 5]})

Now I want to add a separate column with a ranking by Cat1 (so in this case: 1,3,2,4 as new column). My first try was:

data['ranking'] = data['ranking'] + data[data['Cat1'] == 'A']['Counter'].rank(ascending=0).fillna(0)

However, when I add the second Category (data['Cat1']=='B' as condition), it overrides the existing values. This is what I expected, as I have to use .add() as far as I understand. However, the same happens with the following script:

data['ranking'].add(data[data['Cat1']=='A']['Counter'].rank(ascending=0))

Also overrides all values where Cat1==B with NA. How can I avoid this?

Thanks in advance!

-----------------------EDIT!!------------------

Let's say this is my table:

enter image description here

And ordinary rank would give me a ranking of all numbers 1 through 12. Now what I need is a ranking based on the category and as an additional column in the original python DataFrame.

Hence, the last column should look say: 2 (second-ranked value of a) 3 (third-ranked value of a) 1 (first-ranked value of a) 1 (first-ranked value of b) 1 (first-ranked value of c) 5 2 ...

7
  • There has to be something I'm missing. data['ranking'] isn't even defined so is there more logic in between your first 2 lines? To do data['ranking']=data['ranking']+... data['ranking'] has to have an initial value from somewhere. Commented Jan 29, 2013 at 13:33
  • Hey Hoopdady, yes the data['ranking'] is defined - say zeros. I left this step out as I suppose it doesn't really matter Commented Jan 29, 2013 at 13:37
  • So is data['ranking'] is a list. Also just noticed. Your key to your dictionary is a boolean. data[data['Cat1']=='a']... is that what you want? Commented Jan 29, 2013 at 13:39
  • data['ranking'] is a column of the panda DataFrame 'data' - basically an ndarray as far as I know. The data[data['Cat1']=='a'] filters Cat1 for all values=="a". What I want to do in essence is sort the dataframe based on another column. In Excel you can do it with SumProduct like this:mrexcel.com/forum/excel-questions/… Commented Jan 29, 2013 at 13:51
  • Ok. I'm sorry it looked like you had a simple python error. I don't know how the Pandas Dataframe works. But from everything I understand about python, your data object has to be more than a simple dictionary like I believed it was from its definition. Best of luck. Commented Jan 29, 2013 at 14:01

1 Answer 1

2

I'm not sure I understand your question correctly; maybe this one below works?

data['Cat1'][data['Counter'].rank(ascending=0) - 1]

--EDIT--

As in the comment, my solution would be

data['ranking'] = data.groupby('Cat1')['Counter'].rank(ascending=0)

I can't think of anything else, sorry. Maybe others will have a different perspective..

Sign up to request clarification or add additional context in comments.

6 Comments

Hey herrfz - the idea is right, but this only creates the rank for all values, not based on one column as a criteria. Say Cat 1 has the values A, B and C and each of them has multiple values. I want them to be ranked separately and in one extra column of the DataFrame
What should be the rank values? In your example with two Bs, why do you have 3 and 2 as ranks? if you just do data[data['Cat1']=='B']['Counter'].rank(ascending=0) what you get are 1 and 2...
The rank values should be ascending with respect to counter. I will try to make an edit to give an example of what I am looking for
OK, based on the new example, maybe try groupby? data['ranking'] = data.groupby('Cat1')['Counter'].rank(ascending=0)
It's an option - but is there no "better way" without creating this hierarchy?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.