Sorting one dataframe based on column in different dataframe and putting unmatched entries at bottom

Question

I have two dataframes that share an ID column between them. The first dataframe is split out and sent to the owners of the data for updates. Once returned, they are put back together into a single dataframe again. Now one dataframe has been updated and contains new entries with no ID yet and is also in a different order from what is originally was. df1 is the old, df2 is the new I want sort df2 based on the ID column in df1 and leave the new entries at the bottom. The IDs are randomly generated and do not have an order, which is by design.

Is there any good way of doing that? I looked at this post, which makes use of indexing. I could make my ID column the index, but as some new entries would not have an ID yet, that would not work.

I have made a mockup of the situation here:

df=pd.DataFrame(columns=['Name','DataOwner','UniqueID'], data=[['P1',1,123],['P2',2,321],['P3',3,456]])
df2=pd.DataFrame(columns=['Name','DataOwner','UniqueID'], data=[['P1',1,123],['P4', 1, ],['P2',2,321],['P5',2,],['P3',3,456], ['P6', 3, ]])

Which results in these two dataframes:

  Name  DataOwner  UniqueID
0   P1          1       123
1   P2          2       321
2   P3          3       456
  Name  DataOwner  UniqueID
0   P1          1     123.0
1   P4          1       NaN
2   P2          2     321.0
3   P5          2       NaN
4   P3          3     456.0
5   P6          3       NaN

The names of the projects are descriptive text and cannot be used for sorting, the dataowner is not sorted and just put there to illustrate that the data is returned by dataowner, put together in one big datafram before i need to sort it based on the ID with new entries at the bottom.

The result i want is then to have:

  Name  DataOwner  UniqueID
0   P1          1       123
1   P2          2       321
2   P3          3       456
  Name  DataOwner  UniqueID
0   P1          1     123.0
2   P2          2     321.0
4   P3          3     456.0
1   P4          1       NaN
3   P5          2       NaN
5   P6          3       NaN

Although the order of the new entries does not matter - they just need to be at the bottom.

mozway · Accepted Answer · 2023-06-21 13:52:07Z

1

One option using a custom key in sort_values:

key = pd.Series({k:v for v,k in enumerate(df['UniqueID'].unique())})

out = df2.sort_values(by='UniqueID', key=key.reindex, na_position='last')

Output:

  Name  DataOwner  UniqueID
0   P1          1     123.0
2   P2          2     321.0
4   P3          3     456.0
1   P4          1       NaN
3   P5          2       NaN
5   P6          3       NaN

edited Jun 21, 2023 at 13:52

answered Jun 21, 2023 at 13:33

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Eztaban Over a year ago

This does work, but not if i use the column UniqueID, which is the only column i can be absolutely certain is going to be unique and be unchanged in between the two datasets. All the other columns are open to changes when being updated. Thanks even so. #Edit: My bad, this works on the UniqueID columns as well - thank you :)

Eztaban Over a year ago

An important addition i fount later: If I need to be able to use the same indexing in both dataframes afterwards, I have to reset the index: final_sorted_w_reset_index = out.reset_index(drop=True) This ensures, that my newly sorted dataframe has the same index at the same rows as the one i used to sort by. Otherwise, if i said: df[column][idx] And out[column][idx], i would be accessing different rows as the index was sorted as well, but not reset

Code Different · Accepted Answer · 2023-06-21 13:42:16Z

0

CategoricalDtype is my go-to when I need custom sort order. Assume that the index on df2 is unique:

UniqueIDType = pd.CategoricalDtype(df["UniqueID"], ordered=True)
index = df2["UniqueID"].astype(UniqueIDType).sort_values().index

df2.reindex(index)

answered Jun 21, 2023 at 13:42

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

1 Comment

Eztaban Over a year ago

This does work thank you very much. Would you be able to explain a little bit what is going on exatcly?

Collectives™ on Stack Overflow

Sorting one dataframe based on column in different dataframe and putting unmatched entries at bottom

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related