2

I have a DataFrame in pandas that I'd like to select a subset of rows from based on the values of two columns.

test_df = DataFrame({'Topic' : ['A','A','A','B','B'], 'Characteristic' : ['Population','Other','Other','Other','Other'], 'Total' : [25, 22, 21, 20, 30]})

It works as expected and returns the first row when I use this code:

bool1 = test_df['Topic']=='A' 
bool2 = test_df['Characteristic']=='Population'

test_df[bool1 & bool2]

But when I try to do it all in one line as below,

test_df[test_df['Topic']=='A' & test_df['Characteristic']=='Population']

I get "TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]"

Why? Is there a good way to do this in a single step?

1 Answer 1

5

You only need to add parentheses:

>>> test_df[(test_df['Topic']=='A') & (test_df['Characteristic']=='Population')]
  Characteristic Topic  Total
0     Population     A     25

Alternatively, you could use the query method, to avoid the repetition of test_df:

>>> test_df.query("Topic == 'A' and Characteristic == 'Population'")
  Characteristic Topic  Total
0     Population     A     25
Sign up to request clarification or add additional context in comments.

1 Comment

I'm glad you included the query example. While it's 'only' syntactic sugar, it makes code much more readable.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.