Sorting Pandas dataframe data within Groupby groups

Question

I have a large pandas dataframe that can be represented structurally as:

      id          date    status
0     12    2015-05-01         0
1     12    2015-05-22         1
2     12    2015-05-14         1
3     12    2015-05-06         0
4     45    2015-05-03         1
5     45    2015-05-12         1
6     45    2015-05-02         0
7     51    2015-05-05         1
8     51    2015-05-01         0
9     51    2015-05-23         1
10    51    2015-05-17         1
11    51    2015-05-03         0
12    51    2015-05-05         0
13    76    2015-05-04         1
14    76    2015-05-22         1
15    76    2015-05-08         0

And can be created in Python 3.4 using:

tempDF = pd.DataFrame({ 'id': [12,12,12,12,45,45,45,51,51,51,51,51,51,76,76,76],
                        'date': ['2015-05-01','2015-05-22','2015-05-14','2015-05-06','2015-05-03','2015-05-12','2015-05-02','2015-05-05','2015-05-01','2015-05-23','2015-05-17','2015-05-03','2015-05-05','2015-05-04','2015-05-22','2015-05-08'],
                        'status': [0,1,1,0,1,1,0,1,0,1,1,0,0,1,1,0]})
tempDF['date'] = pd.to_datetime(tempDF['date'])

I would like to divide the dataframe into groups based on variable 'id', sort within groups based on 'date' and then get the last 'status' value within each group.

So far, I have:

tempGrouped = tempDF.groupby('id')
tempGrouped['status'].last()

which produces:

However, the status should be 1 in each case (the value associated with the latest date). I can't work out how to sort the groups by date before selecting the last value. It's likely I'm a little snow-blind after trying to work this out for a while, so I apologise in advance if the solution is obvious.

steboc · Accepted Answer · 2015-06-29 17:01:44Z

14

you can sort and group like this :

tempDF.sort(['id','date']).groupby('id')['status'].last()

answered Jun 29, 2015 at 17:01

steboc

1,1811 gold badge7 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1718097 Over a year ago

Of course! I was stuck in the mindset of groupby-ing first and then sorting within groups. Thanks for heads-up!

Monica Heddneck Over a year ago

Why is the opposite approach (sort, then group) any different than (group, then sort)? I wonder why the operations is not commutative.

Dane Macaulay Over a year ago

DataFrame.sort is deprecated, use: tempDF.sort_values(by=['id', 'date'])

Collectives™ on Stack Overflow

Sorting Pandas dataframe data within Groupby groups

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related