1

I have a dataframe

import pandas as pd
import numpy as np

df = pd.DataFrame({'Date':['01-01-2020','01-01-2020','01-01-2020','01-01-2020','01-01-2020'],
                  'Shift':['A','A','A','A','A'],
                  'heat_number':['HA1','HA10','HA8','HA18A','HA5']})

Looking like this

    Date        Shift   heat_number
0   01-01-2020    A        HA1
1   01-01-2020    A        HA10
2   01-01-2020    A        HA8
3   01-01-2020    A        HA18A
4   01-01-2020    A        HA5
5   01-01-2020    A        HA18

if i do df.sort_values(['Date','Shift',heat_number]) I get the below output:

    Date         Shift  heat_number
0   01-01-2020    A        HA1
1   01-01-2020    A        HA10
5   01-01-2020    A        HA18
3   01-01-2020    A        HA18A
4   01-01-2020    A        HA5
2   01-01-2020    A        HA8

But my desired output is:

    Date         Shift  heat_number
0   01-01-2020    A        HA1
4   01-01-2020    A        HA5
2   01-01-2020    A        HA8
1   01-01-2020    A        HA10
5   01-01-2020    A        HA18
3   01-01-2020    A        HA18A

Filter in the heat number column is not as per expectation. How can i fix this?

1
  • There's a pull request [https://github.com/pandas-dev/pandas/issues/3942#issuecomment-508588258] regarding this issue. Commented Jul 17, 2020 at 16:06

2 Answers 2

1

You can, Assign new psuedo columns to the dataFrame DataFrame.assign which is extract of heat_number, apply sort_values on the psuedo column. Finally drop psuedo columns

(
    df.assign(sort_by=df.heat_number.str.extract("(\d+)").astype(int))
        .sort_values(by="sort_by")
        .drop(columns="sort_by")
)

         Date Shift heat_number
0  01-01-2020     A         HA1
4  01-01-2020     A         HA5
2  01-01-2020     A         HA8
1  01-01-2020     A        HA10
3  01-01-2020     A       HA18A
Sign up to request clarification or add additional context in comments.

Comments

0

Here is how I will go about it:

df['len_heat']  = df.heat_number.str.len()

df = df.sort_values(['Date','Shift',"len_heat"])

del df['len_heat']

Basically, it adds a column which has the length of string, sorts and drops this column.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.