pandas dataframe-python check if string exists in another column ignoring upper/lower case

Question

I have the same dataframe as i asked in (pandas dataframe check if column contains string that exists in another column)

Name       Description
Am         Owner of Am
BQ         Employee at bq  
JW         Employee somewhere

I want to check if the name is also a part of the description, and if so keep the row. If it's not, delete the row. In this case, it will delete the 3rd row (JW Employee somewhere)

I am using

df[df.apply(lambda x: x['Name'] in x['Description'], axis = 1)]

In this case, it is also deleting the row of BQ because in the description "bq" is in lowercase. In there anyway to use to same syntax but with taking into consideration case sensitivity ?

ggorlen · Accepted Answer · 2020-08-12 05:47:04Z

4

Use .lower() to make it case-agnostic:

df[df.apply(lambda x: x['Name'].lower() in x['Description'].lower(), axis=1)]

Note that this will consider "am" as a match on "amy". You may wish to use word boundaries to prevent this:

>>> def filter(x): 
...     return bool(re.search(rf"(?i)\b{x['Name']}\b", x["Description"]))
...
>>> df[df.apply(filter, axis=1)]
  Name     Description
0   Am     Owner of Am
1   BQ  Employee at bq

Or split which handles regex special characters better:

df[df.apply(lambda x: x["Name"].lower() in x["Description"].lower().split(), axis=1)]

edited Aug 12, 2020 at 5:47

answered Aug 12, 2020 at 5:22

ggorlen

59.3k8 gold badges119 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

MrNobody33 Over a year ago

The problem with the first solution will be that Am is in Owner of Am and in Owner of Amy too, and in the second case it will not give the desired output, IIUC.

ggorlen Over a year ago

OP hasn't really specified that there's a problem with substrings--that's just a guess/suggestion on my part. The second will only fail if there are regex special characters in the string. You can use split which is probably better. How else would it fail, though, out of curiosity?

MrNobody33 Over a year ago

That's true, you're right, the OP doesn't specified the problem. But if that's the case, I mean, if it's about the substings, the first one will not work. And no, the second is awesome and it will not fail in the guessed substring problem :)!

sushanth · Accepted Answer · 2020-08-12 05:49:20Z

3

You should use

df[df.apply(lambda x: x['Name'] in x['Description'].split(' '), axis = 1)]

edited Aug 12, 2020 at 5:49

sushanth

8,2923 gold badges20 silver badges31 bronze badges

answered Aug 12, 2020 at 5:22

Arpit

3941 silver badge11 bronze badges

2 Comments

Arpit Over a year ago

Yess .lower() should also be added along with .split() . Here it is df[df.apply(lambda x: x['Name'].lower() in x['Description'].lower().split(' '), axis = 1)] . Thanks @ggorlen

sushanth Over a year ago

split by default splits by spaces, .split() would be sufficient.

MrNobody33 · Accepted Answer · 2020-08-12 06:07:59Z

3

You can use lower, split and isin:

msk=df.Description.str.lower().str.split(expand=True).isin(df.Name.str.lower()).any(1)
df[msk]

Output:

  Name     Description
0   Am     Owner of Am
1   BQ  Employee at bq

Details
First we use str.lower to cast the strings to lower case

print(df.Description.str.lower())
0           owner of am
1        employee at bq
2    employee somewhere
Name: Description, dtype: object

Then we split the strings and expand the lists:

print(df.Description.str.lower().str.split(expand=True))
          0          1     2
0     owner         of    am
1  employee         at    bq
2  employee  somewhere  None

Then we check the values that are the df.name with isin

print(df.Description.str.lower().str.split(expand=True).isin(df.Name.str.lower()))
   0      1      2
0  False  False   True
1  False  False   True
2  False  False  False

And finally make any in axis 1 (row-wise), to see if at least one word matched:

print(df.Description.str.lower().str.split(expand=True).isin(df.Name.str.lower()).any(1))
0     True
1     True
2    False
dtype: bool

edited Aug 12, 2020 at 6:07

answered Aug 12, 2020 at 5:36

MrNobody33

6,5039 silver badges20 bronze badges

1 Comment

MrNobody33 Over a year ago

Thanks @ggorlen, yes, just edited the answer, you can compare the column without tolist.

Collectives™ on Stack Overflow

pandas dataframe-python check if string exists in another column ignoring upper/lower case

3 Answers 3

3 Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related