2

I have two dataframes, df1 looks like as follows:

id  year    CalendarWeek    DayName interval    counts
1   2014    1   sun 10:30   3
1   2014    1   sun 11:30   4
1   2014    2   wed 12:00   5
1   2014    2   fri 9:00    2
2   2014    1   sun 13:00   3
2   2014    1   sun 14:30   1
2   2014    1   mon 10:30   2
2   2014    2   wed 14:00   3
2   2014    2   fri 15:00   5
3   2014    1   thu 16:30   2
3   2014    1   thu 17:00   1
3   2014    2   sat 12:00   2
3   2014    2   sat 13:30   3

And df2 looks like as follows:

id  year    CalendarWeek    DayName interval    NewCounts
1   2014    1   sun 10:00   2
1   2014    1   sun 10:30   4
1   2014    1   sun 11:30   5
1   2014    2   wed 10:30   6
1   2014    2   wed 12:00   3
1   2014    2   fri 8:30    1
1   2014    2   fri 9:00    2
2   2014    1   sun 12:30   3
2   2014    1   sun 13:00   4
2   2014    1   sun 14:30   4
2   2014    1   mon 9:00    35
2   2014    1   mon 10:30   1
2   2014    2   wed 12:30   23
2   2014    2   wed 14:00   4
2   2014    2   fri 15:00   3
3   2014    1   thu 14:30   1
3   2014    1   thu 15:00   3
3   2014    1   thu 16:30   34
3   2014    1   thu 17:00   5
3   2014    2   sat 12:00   3
3   2014    2   sat 13:30   4
3   2014    2   sat 14:00   2

I want to pick up all rows in df2 that match the columns id,year,CalendarWeek,DayName and interval in df1. The result I want should looks like as follows:

id  year    CalendarWeek    DayName interval    NewCounts
1   2014    1   sun 10:30   4
1   2014    1   sun 11:30   5
1   2014    2   wed 12:00   3
1   2014    2   fri 9:00    2
2   2014    1   sun 13:00   4
2   2014    1   sun 14:30   4
2   2014    1   mon 10:30   1
2   2014    2   wed 14:00   4
2   2014    2   fri 15:00   3
3   2014    1   thu 16:30   34
3   2014    1   thu 17:00   5
3   2014    2   sat 12:00   3
3   2014    2   sat 13:30   4

In Python, how to select these specific rows in a dataframe based on columns in another dataframe?

Thank you!

1 Answer 1

3

Perform a merge and pass the list of columns to param on, the default type of merge is 'inner' which only matches where values exist in both dfs:

In [2]:

df.merge(df1, on=['id','year','CalendarWeek','DayName','interval'])
Out[2]:
    id  year  CalendarWeek DayName interval  counts  NewCounts
0    1  2014             1     sun    10:30       3          4
1    1  2014             1     sun    11:30       4          5
2    1  2014             2     wed    12:00       5          3
3    1  2014             2     fri     9:00       2          2
4    2  2014             1     sun    13:00       3          4
5    2  2014             1     sun    14:30       1          4
6    2  2014             1     mon    10:30       2          1
7    2  2014             2     wed    14:00       3          4
8    2  2014             2     fri    15:00       5          3
9    3  2014             1     thu    16:30       2         34
10   3  2014             1     thu    17:00       1          5
11   3  2014             2     sat    12:00       2          3
12   3  2014             2     sat    13:30       3          4

If your 'id' column is your index, you'd have to reset the index on both df's so that they become a column in the df's, this is because the inner join will produce an incorrect result if you specify the on list of columns and also specify left_index=True and right_index=True:

In [4]:

df.merge(df1, on=['year','CalendarWeek','DayName','interval'], left_index=True, right_index=True)
Out[4]:
    year  CalendarWeek DayName interval  counts  NewCounts
id                                                        
1   2014             1     sun    10:30       3          2
1   2014             1     sun    10:30       3          4
1   2014             1     sun    10:30       3          5
1   2014             1     sun    10:30       3          6
1   2014             1     sun    10:30       3          3
1   2014             1     sun    10:30       3          1
1   2014             1     sun    10:30       3          2
1   2014             1     sun    11:30       4          2
1   2014             1     sun    11:30       4          4
1   2014             1     sun    11:30       4          5
1   2014             1     sun    11:30       4          6
1   2014             1     sun    11:30       4          3
1   2014             1     sun    11:30       4          1
1   2014             1     sun    11:30       4          2
1   2014             2     wed    12:00       5          2
1   2014             2     wed    12:00       5          4
1   2014             2     wed    12:00       5          5
1   2014             2     wed    12:00       5          6
1   2014             2     wed    12:00       5          3
1   2014             2     wed    12:00       5          1
1   2014             2     wed    12:00       5          2
1   2014             2     fri     9:00       2          2
1   2014             2     fri     9:00       2          4
1   2014             2     fri     9:00       2          5
1   2014             2     fri     9:00       2          6
1   2014             2     fri     9:00       2          3
1   2014             2     fri     9:00       2          1
1   2014             2     fri     9:00       2          2
2   2014             1     sun    13:00       3          3
2   2014             1     sun    13:00       3          4
..   ...           ...     ...      ...     ...        ...
2   2014             2     fri    15:00       5          4
2   2014             2     fri    15:00       5          3
3   2014             1     thu    16:30       2          1
3   2014             1     thu    16:30       2          3
3   2014             1     thu    16:30       2         34
3   2014             1     thu    16:30       2          5
3   2014             1     thu    16:30       2          3
3   2014             1     thu    16:30       2          4
3   2014             1     thu    16:30       2          2
3   2014             1     thu    17:00       1          1
3   2014             1     thu    17:00       1          3
3   2014             1     thu    17:00       1         34
3   2014             1     thu    17:00       1          5
3   2014             1     thu    17:00       1          3
3   2014             1     thu    17:00       1          4
3   2014             1     thu    17:00       1          2
3   2014             2     sat    12:00       2          1
3   2014             2     sat    12:00       2          3
3   2014             2     sat    12:00       2         34
3   2014             2     sat    12:00       2          5
3   2014             2     sat    12:00       2          3
3   2014             2     sat    12:00       2          4
3   2014             2     sat    12:00       2          2
3   2014             2     sat    13:30       3          1
3   2014             2     sat    13:30       3          3
3   2014             2     sat    13:30       3         34
3   2014             2     sat    13:30       3          5
3   2014             2     sat    13:30       3          3
3   2014             2     sat    13:30       3          4
3   2014             2     sat    13:30       3          2

[96 rows x 6 columns]

so to reset the index just do df = df.reset_index(0) and likewise for the other df, after merging you can then set the index back to id so:

merged = df.merge(df1, on=['id','year','CalendarWeek','DayName','interval'])
merged = merged.reset_index()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.