I have two dataframes that share an ID column between them. The first dataframe is split out and sent to the owners of the data for updates. Once returned, they are put back together into a single dataframe again. Now one dataframe has been updated and contains new entries with no ID yet and is also in a different order from what is originally was. df1 is the old, df2 is the new I want sort df2 based on the ID column in df1 and leave the new entries at the bottom. The IDs are randomly generated and do not have an order, which is by design.
Is there any good way of doing that? I looked at this post, which makes use of indexing. I could make my ID column the index, but as some new entries would not have an ID yet, that would not work.
I have made a mockup of the situation here:
df=pd.DataFrame(columns=['Name','DataOwner','UniqueID'], data=[['P1',1,123],['P2',2,321],['P3',3,456]])
df2=pd.DataFrame(columns=['Name','DataOwner','UniqueID'], data=[['P1',1,123],['P4', 1, ],['P2',2,321],['P5',2,],['P3',3,456], ['P6', 3, ]])
Which results in these two dataframes:
Name DataOwner UniqueID
0 P1 1 123
1 P2 2 321
2 P3 3 456
Name DataOwner UniqueID
0 P1 1 123.0
1 P4 1 NaN
2 P2 2 321.0
3 P5 2 NaN
4 P3 3 456.0
5 P6 3 NaN
The names of the projects are descriptive text and cannot be used for sorting, the dataowner is not sorted and just put there to illustrate that the data is returned by dataowner, put together in one big datafram before i need to sort it based on the ID with new entries at the bottom.
The result i want is then to have:
Name DataOwner UniqueID
0 P1 1 123
1 P2 2 321
2 P3 3 456
Name DataOwner UniqueID
0 P1 1 123.0
2 P2 2 321.0
4 P3 3 456.0
1 P4 1 NaN
3 P5 2 NaN
5 P6 3 NaN
Although the order of the new entries does not matter - they just need to be at the bottom.