Pandas Dataframe: Sort list column in dataframe

Question

I have dataframe as below:

   |            types |     TypeList
0  |    Q11424 (item) |  Q11424 (item),Q571 (item)
1  |      Q571 (item) |  Q10 (item),Q24 (item)
0  |    Q11012 (item) |  Q3 (item)
0  |  Q4830453 (item) |  Q4 (item)
0  |  Q7725634 (item) |  Q67 (item),Q12 (item)

I want to sort elements in TypeList column in ascending order. i.e. each row of typelist should be sorted based on the integer part of it. I basically want output as below:

   |            types |     TypeList
0  |    Q11424 (item) |  Q571 (item),Q11424 (item)
1  |      Q571 (item) |  Q10 (item),Q24 (item)
0  |    Q11012 (item) |  Q3 (item)
0  |  Q4830453 (item) |  Q4 (item)
0  |  Q7725634 (item) |  Q12 (item),Q67 (item)

I am able to remove all characters from this TypeList column, keeping only "," seperated strings and further converted it to list i.e. each row of this column is now list of type strings. I wanted to apply sort on that, so I did something like below:

df.TypeList.apply(lambda x: (int(y) for y in x))

but it give result dataframe having all row values as

<generator object <lambda>.<locals>.<genexpr> ...

I am not sure how to solve this issue. Can someone help me to resolve it.

Thanks in advance.

lotrus28 · Accepted Answer · 2017-10-13 10:52:19Z

1

import re
import operator

for i in df.index:
    x = df.loc[i,'TypeList']
    # x ==  'Q11424 (item),Q571 (item)'
    y = x.split(',')
    y = {int(re.search(r'(?<=Q)\d+', k).group(0)):k for k in y}
    # y == {11424: 'Q11424 (item)', 571: 'Q571 (item)'}
    sorted_y = sorted(y.items(), key=operator.itemgetter(0))
    # sorted_y == [(571, 'Q571 (item)'), (11424, 'Q11424 (item)')]
    sorted_x = ','.join([i[1] for i in sorted_y])
    # sorted_x == 'Q571 (item),Q11424 (item)'
    df.loc[i, 'TypeList'] = sorted_x

This one doesn't use apply, as I'm not familiar with it. But I hope you get the idea.

answered Oct 13, 2017 at 10:52

lotrus28

9682 gold badges10 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Nilakshi Naphade Over a year ago

It is giving me error as "AttributeError: 'Series' object has no attribute 'split'"

lotrus28 Over a year ago

@NilakshiNaphade Sorry, I didn't test it with any particular df. I just assumed, your TypeList column contains strings. You might need to transform your cell contents depending on its type

Nilakshi Naphade Over a year ago

I tried converting to string as y = x.str.split(',') but then it gives error as "AttributeError: 'str' object has no attribute 'str'"

lotrus28 Over a year ago

@NilakshiNaphade Can you provide an example df? Have you tried str(x) instead of x.str? The error seems to say that x is already a string. It's strange it produces an error with x.split(',') then

jezrael · Accepted Answer · 2017-10-13 13:04:43Z

1

Use sorted with parameter key:

df = (df['TypeList'].str.split(',')
                   .apply(lambda x:  sorted(x, key=lambda y: int(y.split()[0][1:])))
                   .str.join(','))
print (df)

0    Q571 (item),Q11424 (item)
1        Q10 (item),Q24 (item)
2                    Q3 (item)
3                    Q4 (item)
4        Q12 (item),Q67 (item)
Name: TypeList, dtype: object

edited Oct 13, 2017 at 13:04

answered Oct 13, 2017 at 12:58

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Pandas Dataframe: Sort list column in dataframe

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related