2

I have dataframe as below:

   |            types |     TypeList
0  |    Q11424 (item) |  Q11424 (item),Q571 (item)
1  |      Q571 (item) |  Q10 (item),Q24 (item)
0  |    Q11012 (item) |  Q3 (item)
0  |  Q4830453 (item) |  Q4 (item)
0  |  Q7725634 (item) |  Q67 (item),Q12 (item)

I want to sort elements in TypeList column in ascending order. i.e. each row of typelist should be sorted based on the integer part of it. I basically want output as below:

   |            types |     TypeList
0  |    Q11424 (item) |  Q571 (item),Q11424 (item)
1  |      Q571 (item) |  Q10 (item),Q24 (item)
0  |    Q11012 (item) |  Q3 (item)
0  |  Q4830453 (item) |  Q4 (item)
0  |  Q7725634 (item) |  Q12 (item),Q67 (item)

I am able to remove all characters from this TypeList column, keeping only "," seperated strings and further converted it to list i.e. each row of this column is now list of type strings. I wanted to apply sort on that, so I did something like below:

df.TypeList.apply(lambda x: (int(y) for y in x))

but it give result dataframe having all row values as

<generator object <lambda>.<locals>.<genexpr> ...

I am not sure how to solve this issue. Can someone help me to resolve it.

Thanks in advance.

2 Answers 2

1
import re
import operator

for i in df.index:
    x = df.loc[i,'TypeList']
    # x ==  'Q11424 (item),Q571 (item)'
    y = x.split(',')
    y = {int(re.search(r'(?<=Q)\d+', k).group(0)):k for k in y}
    # y == {11424: 'Q11424 (item)', 571: 'Q571 (item)'}
    sorted_y = sorted(y.items(), key=operator.itemgetter(0))
    # sorted_y == [(571, 'Q571 (item)'), (11424, 'Q11424 (item)')]
    sorted_x = ','.join([i[1] for i in sorted_y])
    # sorted_x == 'Q571 (item),Q11424 (item)'
    df.loc[i, 'TypeList'] = sorted_x

This one doesn't use apply, as I'm not familiar with it. But I hope you get the idea.

Sign up to request clarification or add additional context in comments.

4 Comments

It is giving me error as "AttributeError: 'Series' object has no attribute 'split'"
@NilakshiNaphade Sorry, I didn't test it with any particular df. I just assumed, your TypeList column contains strings. You might need to transform your cell contents depending on its type
I tried converting to string as y = x.str.split(',') but then it gives error as "AttributeError: 'str' object has no attribute 'str'"
@NilakshiNaphade Can you provide an example df? Have you tried str(x) instead of x.str? The error seems to say that x is already a string. It's strange it produces an error with x.split(',') then
1

Use sorted with parameter key:

df = (df['TypeList'].str.split(',')
                   .apply(lambda x:  sorted(x, key=lambda y: int(y.split()[0][1:])))
                   .str.join(','))
print (df)

0    Q571 (item),Q11424 (item)
1        Q10 (item),Q24 (item)
2                    Q3 (item)
3                    Q4 (item)
4        Q12 (item),Q67 (item)
Name: TypeList, dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.