How to implement where clause in python

Question

I want to replicate what where clause does in SQL, using Python. Many times conditions in where clause can be complex and have multiple conditions. I am able to do it in the following way. But I think there should be a smarter way to achieve this. I have following data and code.

My requirement is: I want to select all columns only when first letter in the address is 'N'. This is the initial data frame.

d = {'name': ['john', 'tom', 'bob', 'rock', 'dick'], 'Age': [23, 32, 45, 42, 28], 'YrsOfEducation': [10, 15, 8, 12, 10], 'Address': ['NY', 'NJ', 'PA', 'NY', 'CA']}
import pandas as pd
df = pd.DataFrame(data = d)
df['col1'] = df['Address'].str[0:1] #creating a new column which will have only the first letter from address column
n = df['col1'] == 'N' #creating a filtering criteria where the letter will be equal to N
newdata = df[n] # filtering the dataframe 
newdata1 = newdata.drop('col1', axis = 1) # finally dropping the extra column 'col1'

So after 7 lines of code I am getting this output:

My question is how can I do it more efficiently or is there any smarter way to do that ?

Chris · Accepted Answer · 2018-04-07 18:54:03Z

4

A new column is not necessary:

newdata = df[df['Address'].str[0] == 'N'] # filtering the dataframe 
print (newdata)
  Address  Age  YrsOfEducation  name
0      NY   23              10  john
1      NJ   32              15   tom
3      NY   42              12  rock

edited Apr 7, 2018 at 18:54

Chris

23.2k8 gold badges63 silver badges91 bronze badges

answered Apr 7, 2018 at 18:42

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

singularity2047 Over a year ago

Thanks to both of you, it is a better way indeed. However, if I have multiple conditions in the where clause will this work ? Like I tried following but getting error: TypeError: cannot compare a dtyped [int64] array with a scalar of type [bool] Code: newdata4 = df[df['Address'].str[0] == 'N' & df['Age'] > 30]

jezrael Over a year ago

@singularity2047 - You are rely close, need () only like newdata4 = df[(df['Address'].str[0] == 'N') & (df['Age'] > 30)]

Collectives™ on Stack Overflow

How to implement where clause in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related