1

I'm struggling with a dataframe related problem. There are two dataframes, df and dff, as below

data = np.array([['', 'col1', 'col2'],
            ['row1', 1, 2],
            ['row2', 3, 4]])
df = pd.DataFrame(data=data[1:,1:].astype(int), index=data[1:,0],columns=data[0,1:])


filters=np.array([['', 'col1', 'col2'],
                 ['row1', 1, 1],
                 ['row2', 1, 2],
                 ['row3', 3, 2]])
dff = pd.DataFrame(data=filters[1:,1:].astype(int), index=filters[1:,0],columns=filters[0,1:])

I wish to select rows from df such that their col2 value belongs to a list of values that can be found in dff with matching col1 value. For example, for the col1 value equals to 1, that list should be [1, 2], for the col1 value equals 2, the list is [2].

My best attempt to solve this is

df1 = df[df['col2'].isin(dff[dff['col1']==df['col1']]['col2'])]

But that results in

ValueError: Can only compare identically-labeled Series objects

Any help would be appreciated. Thanks so much.

1 Answer 1

1

As far as I understand, you can simply aggregate

ndf = dff.groupby('col1').agg(lambda x: list(x)).reset_index()

    col1   col2
0   1      [1, 2]
1   3      [2]

and filter whichever values of col1 that are not in df

ndf[ndf.col1.isin(df.col1)]
Sign up to request clarification or add additional context in comments.

1 Comment

thanks a lot, that's what I needed, I also learned something new.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.