Can't sort dataframe column, 'numpy.ndarray' object has no attribute 'sort_values', can't separate numbers with commas

Question

I am working with this csv https://drive.google.com/file/d/1o3Nna6CTdCRvRhszA01xB9chawhngGV7/view?usp=sharing

I am trying to sort by the 'Taxes' column, but when I use

import pandas as pd

df = pd.read_csv('statesFedTaxes.csv')
df.Taxes.values.sort_values()

I get

AttributeError: 'numpy.ndarray' object has no attribute 'sort_values'

This is baffling to me and I cannot find a similar problem online. How can I sort the data by the "Taxes" column?

EDIT: I should explain that my real problem is that when I use

df.sort_values('Taxes')

I get this output:

    State   Taxes
48  Washington  100,609,767
24  Minnesota   102,642,589
25  Mississippi 11,273,202
13  Idaho   11,343,181
30  New Hampshire   12,208,656
54  International   12,611,648
22  Massachusetts   120,035,203
40  Rhode Island    14,325,645
31  New Jersey  140,258,435

Therefore, I assume the commas are getting in the way of my chart sorting properly. How do I get over this?

From DataFrame.values docs: We recommend using DataFrame.to_numpy() instead. (This name should help you understand error i.e., trying to sorting on numpy array). — Parfait
– Parfait, Commented Nov 21, 2020 at 23:24

CJR · Accepted Answer · 2020-11-21 23:27:54Z

3

import pandas as pd
df = pd.DataFrame({"Taxes": ["1,000", "100", "100,000"]})

Your dataframe looks fine when we print it.

>>> df.sort_values(by="Taxes")
     Taxes
0    1,000
1      100
2  100,000

But the dtype is all wrong. This is strings (stored as objects), not numbers. When you call .values you get an array of... more strings, not numbers.

>>> df.dtypes
Taxes    object

So turn them into numbers

>>> df['Taxes'] = df['Taxes'].str.replace(",", "").astype(int)

>>> df.sort_values(by="Taxes")
    Taxes
1     100
0    1000
2  100000

Now it's fine.

Also an option is to just read it in with a thousands separator explicitly defined, which will fix the typing problem earlier.

df = pd.read_csv('statesFedTaxes.csv', thousands=",")

answered Nov 21, 2020 at 23:27

CJR

3,9872 gold badges13 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BenB · Accepted Answer · 2020-11-21 22:52:23Z

2

It's basically the inverted order: you want to sort the column values and then extract them to an array:

df.sort_values("Taxes")["Taxes"].values

answered Nov 21, 2020 at 22:52

BenB

6583 silver badges11 bronze badges

3 Comments

ACan Over a year ago

That's good for sorting the Taxes column, but I want to sort the whole dataframe by the Taxes column. If I do df['Taxes'] = df.sort_values('taxes')['Taxes'].values now I have an inaccurate 'State' column

BenB Over a year ago

Let's do this step by step: The sort_values("Taxes") sorts the entire dataframe by taxes The ["Taxes"] extracts one column, if you want both columns omit this bracket .values converts the content to an array. If it's one column its a 1D array, otherwise 2D.

BenB Over a year ago

Unfortunately, I am unable to add comments to other answers so here is what might be helpful for the problem mentioned in your edit: pass the argument thousands=',' to pd.read_csv so these numbers will be interpreted correctly. It would then look like df = pd.read_csv('statesFedTaxes.csv', thousands=",")

Christopher Compeau · Accepted Answer · 2020-11-21 23:08:24Z

1

df.Taxes is a Series object, and df.Taxes.values is a ndarray object. In this case, you're not calling sort_values on the data frame df - you're trying to call it on the data from the Taxes column itself.

df.sort_values('Taxes') will give you df sorted on that column.

answered Nov 21, 2020 at 23:08

Christopher Compeau

4204 silver badges13 bronze badges

2 Comments

ACan Over a year ago

I realize this should have been my original question: when I do, the numbers are sorted by putting 100,000 on top, then 102,000, then 11,000, then 26,000, etc. In other words, the commas mess up the sorting. How do I overcome this?

Christopher Compeau Over a year ago

This means that the Taxes column is string, not int. You'll need to remove the commas from the string and then convert to integer.

Collectives™ on Stack Overflow

Can't sort dataframe column, 'numpy.ndarray' object has no attribute 'sort_values', can't separate numbers with commas

3 Answers 3

Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related