You can use pd.Series.str.extract
df = pd.DataFrame({'bgrams': [['hello','goodbye'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.extract(regex)
Output:
df
bgrams label
0 [hello, goodbye] goodbye
1 [dog, cat] cat
2 [cow] NaN
For multiple matches, you can use pd.Series.str.findall:
df = pd.DataFrame({'bgrams': [['hello','goodbye','cat'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye, cat] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.findall(regex)
Output:
df
bgrams label
0 [hello, goodbye, cat] [goodbye, cat]
1 [dog, cat] [cat]
2 [cow] []