Pandas: sorting numbers within text by column

Question

I'am trying to sort a dataframe by column using df.sort_index. Such strings column, the second, is composed by numbers within text. After operation I've got:

15 rs1820451 32681212 0.441 0.493 0.5358 98.9 29 0 0.441 T:A 
14 rs1820450 32680556 0.441 0.493 0.5358 98.9 29 0 0.441 G:C 
38 rs1820447 32693541 0.421 0.332 0.0915 94.4 26 0 0.211 G:A 
37 rs1820446 32693440 0.483 0.499 0.9633 100.0 30 0 0.475 G:T 
7 rs1808502 32660555 0.517 0.46 0.543 100.0 30 0 0.358 C:G 
24 rs17817908 32687035 0.407 0.362 0.6159 98.9 29 0 0.237 C:T 
22 rs17817896 32686160 0.407 0.362 0.6159 98.9 29 0 0.237 T:A 
66 rs17236946 32717247 0.492 0.453 0.7762 98.9 29 0 0.347 T:C

Which isn't exactly what I want. The last three lines should be in the beginning. Is there any other dataframe method or an overcome to this?

Wouter Overmeire · Accepted Answer · 2012-09-28 14:18:27Z

1

If you want to sort on a column or multiple columns you need to use df.sort(), df.sort_index() sorts on the index only.

answered Sep 28, 2012 at 14:18

Wouter Overmeire

69.7k10 gold badges67 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

fred Over a year ago

outdata.sort(columns='Name', ascending=True, axis=0), unless I'm doing something wrong, it still doesn't work.

spiralx · Accepted Answer · 2012-09-28 14:16:00Z

0

This has no error checking or optimisation at all, but is this what you want:

def sort_on(lines, col_idx):
  return sorted(lines, key=lambda l: float(l.split()[col_idx]))

lines = """\
15 rs1820451 32681212 0.441 0.493 0.5358 98.9 29 0 0.441 T:A 
14 rs1820450 32680556 0.441 0.493 0.5358 98.9 29 0 0.441 G:C 
38 rs1820447 32693541 0.421 0.332 0.0915 94.4 26 0 0.211 G:A 
37 rs1820446 32693440 0.483 0.499 0.9633 100.0 30 0 0.475 G:T 
7 rs1808502 32660555 0.517 0.46 0.543 100.0 30 0 0.358 C:G 
24 rs17817908 32687035 0.407 0.362 0.6159 98.9 29 0 0.237 C:T 
22 rs17817896 32686160 0.407 0.362 0.6159 98.9 29 0 0.237 T:A 
66 rs17236946 32717247 0.492 0.453 0.7762 98.9 29 0 0.347 T:C
""".splitlines()

sorted_lines = sort_on(lines, 3)
print "\n".join(sorted_lines)

answered Sep 28, 2012 at 14:16

spiralx

1,0657 silver badges16 bronze badges

2 Comments

fred Over a year ago

Hi spiralx, thanks for helping. It works but it isn't a feasible solution. This way a would have to pass my entire dataframe to a string.

spiralx Over a year ago

I can't see any obvious method other than subclassing DataFrame and overloading DataFrame.iteritems, or using DataFrame.apply to get a new df with the numeric values extracted. That, or generate the object with a different column structure to start with, might be the easiest.

fred · Accepted Answer · 2012-09-28 22:41:05Z

0

For futures references, here goes a possible solution.

    cond = ((df['L1'] != rscode) & (df['L2'] != rscode))
    outname = inf + '_test'
    df['L3'] = df['L1'].map(lambda x: int(str(x)[2:]))        
    outdata = df.drop(df[cond].index.values).sort(columns='L3', ascending=False, axis=0)
    # export outdata using Datadrame.to_csv with the original df cols

Improvements are welcome. Best,

answered Sep 28, 2012 at 22:41

fred

10.1k3 gold badges27 silver badges34 bronze badges

Collectives™ on Stack Overflow

Pandas: sorting numbers within text by column

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related