2

This is my dataframe:

import pandas as pd
df = pd.DataFrame({'a': ['axy a', 'xyz b'], 'b': ['obj e', 'oaw r']})

and I have a list of strings:

s1 = 'lorem obj e'
s2 = 'lorem obj e lorem axy a'
s3 = 'lorem xyz b lorem oaw r'
s4 = 'lorem lorem oaw r'
s5 = 'lorem lorem axy a lorem obj e'
s_all = [s1, s2, s3, s4, s5]

Now I want to take every row and check whether both columns of the row are present in any of strings in s_all. For example for first row I select axy_a and obj_e and check if both of them are present in the strings of s_all. Both of them are present in s2 and s5.

the outcome that I want looks like this one:

       a      b      c
0  axy a  obj e  lorem obj e lorem axy a
1  axy a  obj e  lorem lorem axy a lorem obj e
2  xyz b  oaw r  lorem xyz b lorem oaw r

Here is my try but it didn't work:

l = []
for sentence in s_all:
    for i in range(len(df)):
        if df.a.values[i] in sentence and df.b.values[i] in sentence:
            l.append(sentence)
        else:
            l.append(np.nan)

I tried to append the result into a list and then use that list to create the c column that I want but it didn't work.

3 Answers 3

2

You can create a new series object using apply and explode and concat that with your DataFrame

match_series = df.apply(lambda row: [s for s in s_all if row['a'] in s and row['b'] in s], axis=1).explode()
pd.concat([df, match_series], axis=1)

Output

       a      b                              0
0  axy a  obj e        lorem obj e lorem axy a
0  axy a  obj e  lorem lorem axy a lorem obj e
1  xyz b  oaw r        lorem xyz b lorem oaw r
Sign up to request clarification or add additional context in comments.

Comments

1

Due to multiple occurrences of patterns in a and b in the reference strings, you need to repeat their listings as well. This happens by appending l_a and l_b. In turn, a new dataframe df_new is constructed. Modifying your for loop will do.

l = []
l_a = []
l_b = []
for i in range(len(df)):
    for sentence in s_all:
        if df.a.values[i] in sentence and df.b.values[i] in sentence:
            l.append(sentence)
            l_a.append(df.a.values[i])
            l_b.append(df.b.values[i])

df_new = pd.DataFrame({'a' : l_a, 'b' : l_b, 'c' : l})

This yields

a b c
0 axy a obj e lorem obj e lorem axy a
1 axy a obj e lorem lorem axy a lorem obj e
2 xyz b oaw r lorem xyz b lorem oaw r

Comments

1

you can write a little helper function and apply this function row by row to your df:

def func(row):
    out = []
    a, b = row 
    for s in s_all:
        if all([a in s, b in s]):
            out.append(s)
    return out

# if you have more than 2 columns or don't know how many, here more general approach
# other than that, same function as above
def func(row):
    out = [] 
    for s in s_all:
        if all([string in s for string in row.tolist()]):
            out.append(s)
    return out

df['c'] = df.apply(func, axis=1)

Or as one-liner with a lambda function:

df['c'] = df.apply(lambda row: [s for s in s_all if all(string in s for elem in row.tolist() for string in elem)], axis=1)

The function returns a list with results. To make each list element its own row, we use explode

df = df.explode(column='c')
print(df)

Output:

       a      b                              c
0  axy a  obj e        lorem obj e lorem axy a
0  axy a  obj e  lorem lorem axy a lorem obj e
1  xyz b  oaw r        lorem xyz b lorem oaw r

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.