1

I have the same dataframe as i asked in (pandas dataframe check if column contains string that exists in another column)

Name       Description
Am         Owner of Am
BQ         Employee at bq  
JW         Employee somewhere

I want to check if the name is also a part of the description, and if so keep the row. If it's not, delete the row. In this case, it will delete the 3rd row (JW Employee somewhere)

I am using

df[df.apply(lambda x: x['Name'] in x['Description'], axis = 1)]

In this case, it is also deleting the row of BQ because in the description "bq" is in lowercase. In there anyway to use to same syntax but with taking into consideration case sensitivity ?

3 Answers 3

4

Use .lower() to make it case-agnostic:

df[df.apply(lambda x: x['Name'].lower() in x['Description'].lower(), axis=1)]

Note that this will consider "am" as a match on "amy". You may wish to use word boundaries to prevent this:

>>> def filter(x): 
...     return bool(re.search(rf"(?i)\b{x['Name']}\b", x["Description"]))
...
>>> df[df.apply(filter, axis=1)]
  Name     Description
0   Am     Owner of Am
1   BQ  Employee at bq

Or split which handles regex special characters better:

df[df.apply(lambda x: x["Name"].lower() in x["Description"].lower().split(), axis=1)]
Sign up to request clarification or add additional context in comments.

3 Comments

The problem with the first solution will be that Am is in Owner of Am and in Owner of Amy too, and in the second case it will not give the desired output, IIUC.
OP hasn't really specified that there's a problem with substrings--that's just a guess/suggestion on my part. The second will only fail if there are regex special characters in the string. You can use split which is probably better. How else would it fail, though, out of curiosity?
That's true, you're right, the OP doesn't specified the problem. But if that's the case, I mean, if it's about the substings, the first one will not work. And no, the second is awesome and it will not fail in the guessed substring problem :)!
3

You should use

df[df.apply(lambda x: x['Name'] in x['Description'].split(' '), axis = 1)]

2 Comments

Yess .lower() should also be added along with .split() . Here it is df[df.apply(lambda x: x['Name'].lower() in x['Description'].lower().split(' '), axis = 1)] . Thanks @ggorlen
split by default splits by spaces, .split() would be sufficient.
3

You can use lower, split and isin:

msk=df.Description.str.lower().str.split(expand=True).isin(df.Name.str.lower()).any(1)
df[msk]

Output:

  Name     Description
0   Am     Owner of Am
1   BQ  Employee at bq

Details
First we use str.lower to cast the strings to lower case

print(df.Description.str.lower())
0           owner of am
1        employee at bq
2    employee somewhere
Name: Description, dtype: object

Then we split the strings and expand the lists:

print(df.Description.str.lower().str.split(expand=True))
          0          1     2
0     owner         of    am
1  employee         at    bq
2  employee  somewhere  None

Then we check the values that are the df.name with isin

print(df.Description.str.lower().str.split(expand=True).isin(df.Name.str.lower()))
   0      1      2
0  False  False   True
1  False  False   True
2  False  False  False

And finally make any in axis 1 (row-wise), to see if at least one word matched:

print(df.Description.str.lower().str.split(expand=True).isin(df.Name.str.lower()).any(1))
0     True
1     True
2    False
dtype: bool

1 Comment

Thanks @ggorlen, yes, just edited the answer, you can compare the column without tolist.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.