I have a df sorted by person and time. The index is not duplicated, nor is it continuous from 0. I check the difference in time against a threshold depending on row above
person time_bought product
42 abby 2:21 fruit
12 abby 2:55 fruit
10 abby 10:35 other
3 barry 12:00 fruit
...
thresh = {'fruit': pd.Timedelta('10min'), 'other': pd.Timedelta('2min')}
# map custom threshold based on previous row product
ref = df.groupby('person')['product'].shift().map(thresh)
I don't understand why m1 does not retain df's index: I get a sorted index from the lowest index value onwards.
# compare each delta to the custom threshold.
m1 = df.loc[df.product=="fruit", 'time_bought'].groupby(df['person']).diff().gt(ref)
3 False
4 False
If I remove .gt(ref), I only see the filtered rows and original index is retained.
df.loc[df.product=="fruit", 'time_bought'].groupby(df['person']).diff()
42 NaT
12 0 Days 00:34:00
...
A sorted index messes up my next line:m1.cumsum()