Pandas - Comparing two Dataframe using sub-string of one Dataframe column

Question

I am trying to compare two Dataframes using a sub-string in one column with another Dataframe.

Given below is how my data looks like:

Dataframe 1

prod_name, prod_id, prod_category
prod_1, cate_1000101, category_1 
prod_2, cate_123001, category_2
prod_3, cate_900, category_3
prod_4, cate_808, category_4

Dataframe 2

bill_id, bill_date, prod_ref
101, 2021-01-01, 3001
102, 2021-01-01, 5001
103, 2021-01-01, 8080

I am trying to compare if any part of prod_id from Dataframe 1 is available in prod_ref in Dataframe 2

Expected output:

prod_name, prod_id, bill_id, bill_date, prod_ref
prod_2, cate_123001, 101, 2021-01-01, 3001
prod_4, cate_808, 103, 2021-01-01, 8080

Is there a limit on how short the substring match should be? Because it seems prod_ref=5001 could also get matched with any prod_id containing 1, e.g. prod_id=cate_1000101 — tdy
– tdy, Commented Apr 16, 2021 at 4:10
When you say any part of prod_id, is there a minimum number of digits that you are willing to compare? 808 is in 8080, but the '01' from the end of '123001' is in '3001' and '5001' — Derek O
– Derek O, Commented Apr 16, 2021 at 4:17
@KevinNash oh nice! when you are able to, you should accept your own answer so that people who have the same question get directed to the right answer. Cheers! — Derek O
– Derek O, Commented Apr 19, 2021 at 15:13

Kevin Nash · Accepted Answer · 2021-04-19 07:11:01Z

1

I was able to get the required output using the below

df1.merge(df2, left_on = df2.prod_ref.str.extract('(\d+)', expand = False), right_on = df1.prod_id.str.extract('(\d+)', expand = False), how = 'left')

answered Apr 19, 2021 at 7:11

Kevin Nash

1,5714 gold badges22 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas - Comparing two Dataframe using sub-string of one Dataframe column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related