Python DataFrame - Select dataframe rows based on values in another dataframe

Question

I'm struggling with a dataframe related problem. There are two dataframes, df and dff, as below

data = np.array([['', 'col1', 'col2'],
            ['row1', 1, 2],
            ['row2', 3, 4]])
df = pd.DataFrame(data=data[1:,1:].astype(int), index=data[1:,0],columns=data[0,1:])


filters=np.array([['', 'col1', 'col2'],
                 ['row1', 1, 1],
                 ['row2', 1, 2],
                 ['row3', 3, 2]])
dff = pd.DataFrame(data=filters[1:,1:].astype(int), index=filters[1:,0],columns=filters[0,1:])

I wish to select rows from df such that their col2 value belongs to a list of values that can be found in dff with matching col1 value. For example, for the col1 value equals to 1, that list should be [1, 2], for the col1 value equals 2, the list is [2].

My best attempt to solve this is

df1 = df[df['col2'].isin(dff[dff['col1']==df['col1']]['col2'])]

But that results in

ValueError: Can only compare identically-labeled Series objects

Any help would be appreciated. Thanks so much.

rafaelc · Accepted Answer · 2018-06-17 16:32:06Z

1

As far as I understand, you can simply aggregate

ndf = dff.groupby('col1').agg(lambda x: list(x)).reset_index()

    col1   col2
0   1      [1, 2]
1   3      [2]

and filter whichever values of col1 that are not in df

ndf[ndf.col1.isin(df.col1)]

answered Jun 17, 2018 at 16:32

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marina Over a year ago

thanks a lot, that's what I needed, I also learned something new.

Collectives™ on Stack Overflow

Python DataFrame - Select dataframe rows based on values in another dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related