4

I have two Dataframes with different columns and size.

The first one has some columns and one of them is a string field (column 1). The second dataframe has 2 columns, one is a string field (column 4) with 2 words separated by comma and another is a integer field (column 5).

I need to verify that column 1 in Dataframe 1 has the words in column 4 Dataframe 2 and fill in Dataframe 1 with the corresponding information in dataframe 2.

Example:

df1
    column 1                                column 2          column 3
0   bla bla sample1 bla bla sample2         a                 f
1   bla bla sample1 bla bla sample5         b                 g
2   bla bla sample3 bla bla sample4         c                 h
3   bla bla sample8 bla bla sample7         d                 i
4   bla bla sample1 bla bla sample2         e                 j

 df2
    column 4                       column 5
0   ('sample1', 'sample2'),        50
1   ('sample3', 'sample4'),        35 
2   ('sample1', 'sample5')         18

I need the output:

Output:
df1
    column 1                               column 2  column 3  column 4                     column 5
0   bla bla sample1 bla bla sample2        a         f         ('sample1', 'sample2')     50
1   bla bla sample1 bla bla sample5        b         g         ('sample1', 'sample5')     18
2   bla bla sample3 bla bla sample4        c         h         ('sample3', 'sample4')     35
3   bla bla sample8 bla bla sample7        d         i         NaN        
4   bla bla sample1 bla bla sample2        e         j         ('sample1', 'sample2')     50

Any ideas?

Thanks!

6
  • What have you tried so far? Commented Mar 1, 2019 at 15:23
  • Two questions. First, is column 4 a string separated by commas 'sample1, sample2' as you indicate in the question, or a tuple of two strings, as you have in your sample. Second, is the requirement that both strings be found in column1? Commented Mar 1, 2019 at 15:36
  • 1
    I can only think of O(n,m) solution .. Commented Mar 1, 2019 at 15:39
  • @ALollz First - Sorry, it's a tuple of two strings. Second - Yes, both strings be found in column 1. Commented Mar 1, 2019 at 15:54
  • 1
    @ALollz when column 1 has more than one combination, column 4 must have all combinations and add the value of column 5. Example: Output - Df1: Column 1 - bla bla sample1 bla sample2 bla sample5 Column 4 - sample1, sample2, sample5 Column 5 - 68 Commented Mar 1, 2019 at 16:10

1 Answer 1

1

I don't guarantee this will be particularly fast, but it gets the job done. We'll use set logic to check for matches. We have to jump through some hoops so that we can store a list of tuples of the matches. I don't think this is a particularly good idea.

import numpy as np
import pandas as pd

df1['setc'] = df1['column 1'].str.split().apply(set)
# Initialize so addition works
df1['column 4'] = [[] for i in range(len(df1))]
df1['column 5'] = 0

for idx, row in df2.iterrows():
    m = (df1.setc.values & set(row['column 4'])) == set(row['column 4'])
    df1.loc[m, 'column 4'] += pd.Series([[row['column 4']] for x in range(len(m))])[m]
    df1.loc[m, 'column 5'] += row['column 5']

df1 = df1.drop(columns='setc')
# NaN where nothing matched
df1.loc[df1['column 4'].str.len().eq(0), ['column 4', 'column 5']] = np.NaN

Output:

                          column 1 column 2 column 3                                  column 4  column 5
0  bla bla sample1 sample5 sample2        a        f  [(sample1, sample2), (sample1, sample5)]      68.0
1  bla bla sample1 bla bla sample5        b        g                      [(sample1, sample5)]      18.0
2  bla bla sample3 bla bla sample4        c        h                      [(sample3, sample4)]      35.0
3  bla bla sample8 bla bla sample7        d        i                                       NaN       NaN
4  bla bla sample1 bla bla sample2        e        j                      [(sample1, sample2)]      50.0
Sign up to request clarification or add additional context in comments.

1 Comment

Worked perfectly! Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.