6

As seen in the image below, I would like to sort the chats by Type in alphabetical order. However, I do not wish to mess up the order of [Date , User_id] within each Chat name. How should I do so given that I have the input dataframe on the left? (Using Pandas in python)

enter image description here

3
  • 1
    pandas.pydata.org/pandas-docs/stable/generated/… Commented Oct 24, 2018 at 15:47
  • What you are describing is called a stable sort -- a sort that does not remove relative ordering. DataFrame.sort_values() offers multiple sorting algorithms via the kind argument. kind='mergesort' is documented as being a stable sort. Commented Oct 24, 2018 at 15:50
  • there is now a sort type that is literally called 'stable' so kind='stable' may be easier to remember. Digging further, the implementation of 'mergesort' is actually for backwards compatibility per numpy doc and doesn't necessarily do a mergesort depending on datatype, but it is just 'stable' under the hood. Commented Jan 25, 2024 at 2:12

2 Answers 2

7

You want to sort the values using a stable sorting algorithm which is mergesort:

df.sort_values(by='Type', kind='mergesort') 

From the linked answer:

A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted.

From pandas docs:

kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’

Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.


Update: As @ALollz correctly pointed out it is better to convert all the values to lower case first and then do the sorting (i.e. otherwise "Bird" will be placed before "aligator" in the result):

df['temp'] = df['Type'].str.lower()
df = df.sort_values(by='temp', kind='mergesort')
df = df.drop('temp', axis=1) 
Sign up to request clarification or add additional context in comments.

Comments

1
df.sort_values(by=['Type']) [1]

You could do your own sort function[2], string could be compare directly stringRow2 < stringRow3 .

[1] https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html [2] Sort pandas DataFrame with function over column values

1 Comment

This doesn't give the desired ouptut... There are issues with capitalization in the data

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.