I am trying to solve the following problem.
I have the following dataframe df:
df = pd.DataFrame({'A': ['id1', 'id1', 'id2', 'id2', 'id2','id2', 'id2', 'id2','id2', 'id3', 'id3', 'id3'] ,
'B': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21] ,
'C': [101 , 32, 10, 9, 15, 15, 15, 15, 15, 40, 36, 36]} )
df
Out[16]:
A B C
0 id1 10 101
1 id1 11 32
2 id2 12 10
3 id2 13 9
4 id2 14 15
5 id2 15 15
6 id2 16 15
7 id2 17 15
8 id2 18 15
9 id3 19 40
10 id3 20 36
11 id3 21 36
I now wish to rearrange the dataframe such that the values in column C are sorted in ascending order for each subgroup defined by the id values in column A. I use the following piece of code:
df2 = df
df2 = df2.sort_values(by=['A','C'], ascending=True).groupby('A').head()
and I get this:
df2
Out[18]:
A B C
1 id1 11 32
0 id1 10 101
3 id2 13 9
2 id2 12 10
4 id2 14 15
5 id2 15 15
6 id2 16 15
10 id3 20 36
11 id3 21 36
9 id3 19 40
The values in C corresponding to the subgroup id1 in col A have been all sorted correctly, as well as those values corresponding to the subgroup id3. However, the sorting operation of col C relative to id2 in col A has skipped two rows...
print len(df.index), len(df2.index)
12 10
Any idea why does this happen and how to fix this issue? Any help is very much appreciated.
Thanks, MarcoC
head()by default gets top 5 values.