2

I just recently discovered the power of pandas. (Thanks Wes McKinney!) I have a csv that contains the following information:

RUN_START_DATE,PUSHUP_START_DATE,SITUP_START_DATE,PULLUP_START_DATE
2013-01-24,2013-01-02,2013-01-30,2013-02-03
2013-01-30,2013-01-21,2013-01-13,2013-01-06
2013-01-29,2013-01-28,2013-01-01,2013-01-29
2013-02-16,2013-02-12,2013-01-04,2013-02-11
2013-01-06,2013-02-07,2013-02-25,2013-02-12
2013-01-26,2013-01-28,2013-02-12,2013-01-10
2013-01-26,2013-02-10,2013-01-12,2013-01-30
2013-01-03,2013-01-24,2013-01-19,2013-01-02
2013-01-22,2013-01-13,2013-02-03,2013-02-05
2013-02-06,2013-01-16,2013-02-07,2013-01-11

Normally, I do not use pandas for this process. I use the csv library to generate a lists. Convert them using the datetime library. I then loop through each line and run something like the following to get the sorted index of each row:

'"' + ','.join(map(str, sorted(range(len(dates)), key=lambda k: dates[k]))) + '"'

It then returns something like this for each line:

Out[40]: '"1,0,2,3"'

I then I add it at the end of each line as a new field in my csv.

I can read the csv into pandas and convert the items to the date dtype. I am just unsure how to go about getting the sorted index values using pandas and then flattening them into a string and putting them into a column? Any help most appreciated!

1 Answer 1

8

You can use numpy.argsort() to get the sort index:

from StringIO import StringIO
import numpy as np
import pandas as pd

txt = """RUN_START_DATE,PUSHUP_START_DATE,SITUP_START_DATE,PULLUP_START_DATE
2013-01-24,2013-01-02,2013-01-30,2013-02-03
2013-01-30,2013-01-21,2013-01-13,2013-01-06
2013-01-29,2013-01-28,2013-01-01,2013-01-29
2013-02-16,2013-02-12,2013-01-04,2013-02-11
2013-01-06,2013-02-07,2013-02-25,2013-02-12
2013-01-26,2013-01-28,2013-02-12,2013-01-10
2013-01-26,2013-02-10,2013-01-12,2013-01-30
2013-01-03,2013-01-24,2013-01-19,2013-01-02
2013-01-22,2013-01-13,2013-02-03,2013-02-05
2013-02-06,2013-01-16,2013-02-07,2013-01-11"""
df = pd.read_csv(StringIO(txt))
idx = np.argsort(df, axis=1)
buf = StringIO()
idx.to_csv(buf, index=False, header=False)
print buf.getvalue()

the output:

1,0,2,3
3,2,1,0
2,1,0,3
2,3,1,0
0,1,3,2
3,0,1,2
2,0,3,1
3,0,2,1
1,0,2,3
3,1,0,2
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! this got me going in the right direction. Instead of writing out, I wanted to add it to a column. But, I came up with the following code that seems to work. df['AL_SQ'] = ['"' + ','.join(row) + '"' for row in idx[idx.columns].astype('str').values]
You can get a string Series by: idx.apply(lambda s:",".join(s.astype(str)), axis=1) or buf.getvalue().split("\n") to get a string list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.