4

I have a Python Pandas Dataframe, in which a column named status contains three kinds of possible values: ok, must read x more books, does not read any books yet, where x is an integer higher than 0.

I want to sort status values according to the order above.

Example:

  name    status
0 Paul    ok
1 Jean    must read 1 more books
2 Robert  must read 2 more books
3 John    does not read any book yet

I've found some interesting hints, using Pandas Categorical and map but I don't know how to deal with variable values modifying strings.

How can I achieve that?

2
  • To be clear, do you need integers within your strings to be ordered, e.g. "must read 20 books" comes after "must read 3 books" ? Commented Aug 1, 2018 at 15:12
  • Yes, but not only that. The order have to be 'ok' first, 'must read 1 more books',...,'must read n more books' and, finally, 'does not read any book yet'. Commented Aug 1, 2018 at 15:15

2 Answers 2

10

Use:

a = df['status'].str.extract('(\d+)', expand=False).astype(float)

d = {'ok': a.max() + 1, 'does not read any book yet':-1}

df1 = df.iloc[(-df['status'].map(d).fillna(a)).argsort()]
print (df1)
     name                      status
0    Paul                          ok
2  Robert      must read 2 more books
1    Jean      must read 1 more books
3    John  does not read any book yet

Explanation:

  1. First extract integers by regex \d+
  2. Then dynamically create dictionary for map non numeric values
  3. Replace NaNs by fillna for numeric Series
  4. Get positions by argsort
  5. Select by iloc for sorted values
Sign up to request clarification or add additional context in comments.

1 Comment

your 'ok' and 'does not read any book yet' values should be switched, and the order of the rows reversed.
3

You can use sorted with a custom function to calculate the indices which would be sort an array (much like numpy.argsort). Then feed to pd.DataFrame.iloc:

df = pd.DataFrame({'name': ['Paul', 'Jean', 'Robert', 'John'],
                   'status': ['ok', 'must read 20 more books',
                              'must read 3 more books', 'does not read any book yet']})

def sort_key(x):
    if x[1] == 'ok':
        return -1
    elif x[1] == 'does not read any book yet':
        return np.inf
    else:
        return int(x[1].split()[2])

idx = [idx for idx, _ in sorted(enumerate(df['status']), key=sort_key)]

df = df.iloc[idx, :]

print(df)

     name                      status
0    Paul                          ok
2  Robert      must read 3 more books
1    Jean     must read 20 more books
3    John  does not read any book yet

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.