3

I have the following pandas DataFrame with mixed data types: string and integer values. I want to sort values of this DataFrame in descending order using multiple columns: Price and Name. The string values (i.e. Name) should be sorted in the alphabetical order, or actually can be ignored at all, because the most important ones are numerical values.

The problem is that the list of target columns can contain both string and integer columns, e.g. target_columns = ["Price","Name"]

d = {'1': ['25', 'AAA', 2], '2': ['30', 'BBB', 3], '3': ['5', 'CCC', 2], \
     '4': ['300', 'DDD', 2], '5': ['30', 'DDD', 3],  '6': ['100', 'AAA', 3]}

columns=['Price', 'Name', 'Class']

target_columns = ['Price', 'Name']
order_per_cols = [False] * len(target_columns)

df = pd.DataFrame.from_dict(data=d, orient='index')
df.columns = columns
df.sort_values(list(target_columns), ascending=order_per_cols, inplace=True)

Currently, this code fails with the following message:

TypeError: '<' not supported between instances of 'str' and 'int'

The expected output:

Price    Name    Class
300      DDD     2
100      AAA     3
30       DDD     3
30       BBB     3
25       AAA     2
5        CCC     2
3
  • What is your expected output? Commented Dec 3, 2019 at 20:39
  • @Erfan: Please see my update. But as I said, the sorting of string columns can be ignored if there is some way to identify them in target_columns. Commented Dec 3, 2019 at 20:43
  • Then why not just df = df.sort_values('Price', ascending=False)? Commented Dec 3, 2019 at 20:44

2 Answers 2

1

If I understand you correctly, you want a generic way that excludes the object columns from your selection.

We can use DataFrame.select_dtypes for this, then sort on the numeric columns:

# df['Price'] = pd.to_numeric(df['Price'])
numeric = df[target_columns].select_dtypes('number').columns.tolist()
df = df.sort_values(numeric, ascending=[False]*len(numeric))
   Price Name  Class
4    300  DDD      2
6    100  AAA      3
2     30  BBB      3
5     30  DDD      3
1     25  AAA      2
3      5  CCC      2
Sign up to request clarification or add additional context in comments.

5 Comments

Does numeric permit int64, float?
Yes, quoted from documentation: "To select all numeric types, use np.number or 'number'"
Sorry, in my case all columns might be 'object'. Those that are indeed numeric might be also object or datetime, as you can see in case of Price.
Oh, haven't seen your line df['Price'] = pd.to_numeric(df['Price']). But how can I know that Price should be transformed to numeric? I do not have any info about column types in advance.
AttributeError: 'Series' object has no attribute 'select_dtypes'
0

One more solution could be -

Using 'by' parameter in sort_values function

d = ({'1': ['25', 'AAA', 2], '2': ['30', 'BBB', 3], '3': ['5', 'CCC', 2], \
     '4': ['300', 'DDD', 2], '5': ['30', 'DDD', 3],  '6': ['100', 'AAA', 3]})

df = pd.DataFrame.from_dict(data=d,columns=['Price','Name','Class'],orient='index')
df['Price'] = pd.to_numeric(df['Price'])
df.sort_values(**by** = ['Price','Name'],ascending=False)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.