How to check if string in list of strings is in pandas dataframe column

Question

I'm doing text analysis now. My task is to count how many times each 'bad word' in a list appears in a string in a dataframe column. What I can think of is to use .isin() or .contains() to check word by word. But the length of the word list is over 40000. So the loop will be too slow. Is there a better way to do this?

Go the opposite way - split the string into words and check if the words are in the "naughty list", keeping track of counts (and catching a lot of value errors resulting from words not being in there). This should be significantly faster if the strings aren't too long — Lukas Thaler
– Lukas Thaler, Commented Nov 14, 2019 at 14:22

Gorlomi · Accepted Answer · 2019-11-14 14:43:36Z

1

While you said that a loop might be too slow it does seem like the most efficient way due to the extent of the list. Tried to keep it as simple as possible. Feel free to modify the print statement based on your needs.

text = 'Bad Word test for Terrible Word same as Horrible Word and NSFW Word and Bad Word again'
bad_words = ['Bad Word', 'Terrible Word', 'Horrible Word', 'NSFW Word']

length_list = []

for i in bad_words:
    count = text.count(i)
    length_list.append([i, count])


print(length_list)

output:

[['Bad Word', 2], ['Terrible Word', 1], ['Horrible Word', 1], ['NSFW Word', 1]]

Alternatively your output as a string can be:

length_list = []

for i in bad_words:
    count = text.count(i)
    print(i + ' count: ' + str(count))

Output:

Bad Word count: 2
Terrible Word count: 1
Horrible Word count: 1
NSFW Word count: 1

edited Nov 14, 2019 at 14:43

answered Nov 14, 2019 at 14:29

Gorlomi

5152 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to check if string in list of strings is in pandas dataframe column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related