Return index values of a sorted row using pandas?

Question

I just recently discovered the power of pandas. (Thanks Wes McKinney!) I have a csv that contains the following information:

RUN_START_DATE,PUSHUP_START_DATE,SITUP_START_DATE,PULLUP_START_DATE
2013-01-24,2013-01-02,2013-01-30,2013-02-03
2013-01-30,2013-01-21,2013-01-13,2013-01-06
2013-01-29,2013-01-28,2013-01-01,2013-01-29
2013-02-16,2013-02-12,2013-01-04,2013-02-11
2013-01-06,2013-02-07,2013-02-25,2013-02-12
2013-01-26,2013-01-28,2013-02-12,2013-01-10
2013-01-26,2013-02-10,2013-01-12,2013-01-30
2013-01-03,2013-01-24,2013-01-19,2013-01-02
2013-01-22,2013-01-13,2013-02-03,2013-02-05
2013-02-06,2013-01-16,2013-02-07,2013-01-11

Normally, I do not use pandas for this process. I use the csv library to generate a lists. Convert them using the datetime library. I then loop through each line and run something like the following to get the sorted index of each row:

'"' + ','.join(map(str, sorted(range(len(dates)), key=lambda k: dates[k]))) + '"'

It then returns something like this for each line:

Out[40]: '"1,0,2,3"'

I then I add it at the end of each line as a new field in my csv.

I can read the csv into pandas and convert the items to the date dtype. I am just unsure how to go about getting the sorted index values using pandas and then flattening them into a string and putting them into a column? Any help most appreciated!

HYRY · Accepted Answer · 2013-02-28 14:33:27Z

8

You can use numpy.argsort() to get the sort index:

from StringIO import StringIO
import numpy as np
import pandas as pd

txt = """RUN_START_DATE,PUSHUP_START_DATE,SITUP_START_DATE,PULLUP_START_DATE
2013-01-24,2013-01-02,2013-01-30,2013-02-03
2013-01-30,2013-01-21,2013-01-13,2013-01-06
2013-01-29,2013-01-28,2013-01-01,2013-01-29
2013-02-16,2013-02-12,2013-01-04,2013-02-11
2013-01-06,2013-02-07,2013-02-25,2013-02-12
2013-01-26,2013-01-28,2013-02-12,2013-01-10
2013-01-26,2013-02-10,2013-01-12,2013-01-30
2013-01-03,2013-01-24,2013-01-19,2013-01-02
2013-01-22,2013-01-13,2013-02-03,2013-02-05
2013-02-06,2013-01-16,2013-02-07,2013-01-11"""
df = pd.read_csv(StringIO(txt))
idx = np.argsort(df, axis=1)
buf = StringIO()
idx.to_csv(buf, index=False, header=False)
print buf.getvalue()

the output:

1,0,2,3
3,2,1,0
2,1,0,3
2,3,1,0
0,1,3,2
3,0,1,2
2,0,3,1
3,0,2,1
1,0,2,3
3,1,0,2

answered Feb 28, 2013 at 14:33

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

BigHandsome Over a year ago

Thank you! this got me going in the right direction. Instead of writing out, I wanted to add it to a column. But, I came up with the following code that seems to work. df['AL_SQ'] = ['"' + ','.join(row) + '"' for row in idx[idx.columns].astype('str').values]

HYRY Over a year ago

You can get a string Series by: idx.apply(lambda s:",".join(s.astype(str)), axis=1) or buf.getvalue().split("\n") to get a string list.

Collectives™ on Stack Overflow

Return index values of a sorted row using pandas?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related