1

Problem description: i need to set a variable for each line, but only if it is within the range of a list in a second column in the same row.

Sample Dataframe:

df = pd.Dataframe({'col1': ['A', 'T' , 'P', 'Z'], 'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})

i need to recieve all rows where col1 is part of col2. expected result:

col1    col2
'A'     'A, B, C'
'P'     'G, H, I, P'

My approach which returns a TypeError about Series objects being mutable and can not be hashed:

df[df['col2'].str.match(df['col1'])]

As far as i understand i have to point out somehow that the compare should be done within one row. I know itterrows would be an solution but i would prefer something without looping.

1
  • 1
    What does it mean for strings to be "within the range"? Is Z "within the range" of [M, N, R, ZGTR]? What is the criteria for inclusion? Commented Jun 19, 2020 at 7:08

1 Answer 1

2

Use list comprehension with test by in with splitted values:

import pandas as pd

df = pd.DataFrame({'col1': ['A', 'T' , 'P', 'Z'], 
                   'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})
df = df[[b in a.split(', ') for a, b in df[['col2', 'col1']].values]]
print (df)
  col1        col2
0    A     A, B, C
2    P  G, H, I, P
Sign up to request clarification or add additional context in comments.

2 Comments

Your solution works for the given example. Problem is that i have longer strings than given in example. I will adjust
@himself - So is necessary test in splitted values by , ? Then was added split(', ')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.