Colleagues,
Maybe you can help me with what appears to be simple task, but I am not yet experianced enough to figure it out.
Lets say we have two dataframes:
- df1 contains substrings;
- df2 contains longer blocks of text, some of them contain substrings from df1.
df1 = {'subst': ['LONDON BRIDGE', 'TRUE GRIT', 'FIVE TIMES FIVE', 'THREE TIME DEAD', 'TRUE IS NOT', 'OH NO', 'LEBRON JAMES']}
df2 = {'strng': ['LEBRON JAMES SCORED 20', 'THREE TIMES DEAD JOHNY WAS HELL OF THE COOK', 'TRUE IS NOT WHAT YOU THINK', 'FIVE TIMES FIVE IS NOT WHAT LEBRON SCORED']}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
Here is what I need:
- I need to iterate through the rows to check if substrings in df1['subst'] are present anywhere in df2['strng']
- If it is present in df2, I want new column ['match_df1'] in df2 that would contain substring value from df1.
Final output in df2 would look something like this
| strng | match_df1 |
|---|---|
| LEBRON JAMES SCORED 20 | LEBRON JAMES |
| THREE TIMES DEAD JOHNY WAS HELL OF THE COOK | THREE TIMES DEAD |
| TRUE IS NOT WHAT YOU THINK | TRUE IS NOT |
| FIVE TIMES FIVE IS NOT WHAT LEBRON SCORED | FIVE TIMES FIVE |