Check if String in List of Strings is in Pandas DataFrame Column

Question

I am new to Python and have a question regarding matching strings in a list to a column in a df.

When I run the following commands, I would like a new column named "Match" to be created, and if there is a match between the string in the list and the string in the column the value in the "Match" column and corresponding row should be True, if no match, then False. The desired outcome would be False, False, True, False, False. Since the string "Honda" is not an exact match of "Honda Civic" it should not be True. Same with "Toy" is not the exact match of "Toyota Corolla".

Creating df:

Cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4', np.nan],
    'Price': [22000,25000,27000,35000, 29000],
    'Liscence Plate': ['ABC 123', 'XYZ 789', 'CBA 321', 'ZYX 987', 'DEF 456']}

df = DataFrame(Cars,columns= ['Brand', 'Price', 'Liscence Plate'])

I then create a list of the values I would like to search for, joined by |.

search_for_these_values = ['Honda', 'Toy', 'Ford Focus', 'Audi A4 2019']
pattern = '|'.join(search_for_these_values)

Here I have tried the str.match command and are given True, True, True, False, False.

df['Match'] = df["Brand"].str.match(pattern, na=False)

Here I have created a loop using the == operator and are given False, False, False, False, False.

for i in range(0,len(pattern)):
    df['Match'] = df['Brand'] == pattern[i]

Thank you for the help!

you are looking for isin: df.Brand.isin(search_for_these_values) — anky
– anky, Commented May 16, 2019 at 13:53

jezrael · Accepted Answer · 2019-05-16 14:56:57Z

9

If need match values in list, use Series.isin:

df['Match'] = df["Brand"].isin(search_for_these_values)
print (df)
            Brand  Price Liscence Plate  Match
0     Honda Civic  22000        ABC 123  False
1  Toyota Corolla  25000        XYZ 789  False
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987  False
4             NaN  29000        DEF 456  False

Solution with match is used for check substrings, so different output.

Alternative solution for match substrings with Series.str.contains and parameter na=False:

df['Match'] = df["Brand"].str.contains(pattern, na=False)
print (df)
            Brand  Price Liscence Plate  Match
0     Honda Civic  22000        ABC 123   True
1  Toyota Corolla  25000        XYZ 789   True
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987  False
4             NaN  29000        DEF 456  False

EDIT:

For test values in substrings is possible use list comprehension with loop by values in search_for_these_values and test match by in with any for return at least one True:

df['Match'] = [any(x in z for z in search_for_these_values) 
                                if x == x 
                                else False 
                                for x in df["Brand"]]
print (df)

            Brand  Price Liscence Plate  Match
0     Honda Civic  22000        ABC 123  False
1  Toyota Corolla  25000        XYZ 789  False
2      Ford Focus  27000        CBA 321   True
3         Audi A4  35000        ZYX 987   True
4             NaN  29000        DEF 456  False

edited May 16, 2019 at 14:56

answered May 16, 2019 at 13:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

BKlassen Over a year ago

Okay great, thank you so much for the help! Another quick question, if I wanted to display true if the entire string in search_for_these_values is within the string in the column, which command would I use. For ex. the string "Audi A4 2019" to display True since the entire "Audi A4" string is within "Audi A4 2019". The desired outcome would be False, False, True, True, False.

jezrael Over a year ago

hmmm, more complicated like seems.

jezrael Over a year ago

@BradKlassen - Added solution to answer.

BKlassen Over a year ago

Great thanks for being so helpful, much appreciated!

Collectives™ on Stack Overflow

Check if String in List of Strings is in Pandas DataFrame Column

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related