1

I would like to check if each word in the labels list exist in each list in the column 'bigrams'.

And if one these words exist in the bigram list, I would like to replace the label none by the word that exists.

I tried to write two consecutive for loop but it doesn't work. I also tried a comprehension list.

How can I do ?

enter image description here

1 Answer 1

0

You can use pd.Series.str.extract

df = pd.DataFrame({'bgrams': [['hello','goodbye'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
#             bgrams label
#0  [hello, goodbye]  None
#1        [dog, cat]  None
#2             [cow]  None

labels=['cat','goodbye']

regex='('+'|'.join(labels)+')'

df['label']=df.bgrams.astype(str).str.extract(regex)

Output:

df
             bgrams    label
0  [hello, goodbye]  goodbye
1        [dog, cat]      cat
2             [cow]      NaN

For multiple matches, you can use pd.Series.str.findall:

df = pd.DataFrame({'bgrams': [['hello','goodbye','cat'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
#             bgrams label
#0  [hello, goodbye, cat]  None
#1        [dog, cat]  None
#2             [cow]  None

labels=['cat','goodbye']

regex='('+'|'.join(labels)+')'

df['label']=df.bgrams.astype(str).str.findall(regex)

Output:

df
                  bgrams           label
0  [hello, goodbye, cat]  [goodbye, cat]
1             [dog, cat]           [cat]
2                  [cow]              []
Sign up to request clarification or add additional context in comments.

2 Comments

You're welcome @LJRB, if it works for you, consider accepting the answer, thanks :)
It works but not for all the words I have in the list. When I create the variable regex the new list inverted some words. For example if I have 'guerre virus' it inverted the two words. So maybe it will not find 'guerre virus'. I would like to find both 'guerre virus' or 'virus guerre'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.