0

I have a pandas dataframe that looks like below

enter image description here

This dataframe is already grouped by the three columns O, A, N but as you see it is NOT sorted by time column

My goal is to sort it based on the time column by maintaining the groupby of O, A, N and then do shift(-1) operation for value column to create a value_next observation.

The output should look like below (NaN is imputed with -1` for demonstration)

enter image description here

I did below:

import pandas as pd
  
# Initialize data to lists.
data = [{'time': 10, 'O': 1, 'A': 2, 'N':3, 'value': 10},
        {'time': 7, 'O': 1, 'A': 2, 'N':3, 'value': 11},
       {'time': 15, 'O': 1, 'A': 2, 'N':3, 'value': 12},
       {'time': 11, 'O': 2, 'A': 2, 'N':3, 'value': 20},
        {'time': 12, 'O': 2, 'A': 2, 'N':3, 'value': 21},
       {'time': 1, 'O': 2, 'A': 2, 'N':3, 'value': 25}]
  
# Creates DataFrame.
df = pd.DataFrame(data)
  
#sorting
df.sort_values(by=['O', 'A', 'N', 'time'], ascending=[True, True, True, True])

#shift
df['value_next'] = df.groupby(['O', 'A', 'N'])['value'].shift(-1)

This generates output below which is different than the expected. What am I missing?

enter image description here

Please suggest.

0

1 Answer 1

1

sort_values is not an inplace operation by default. Either pass inplace=True

df.sort_values(['O','A', 'N', 'time'], inplace=True)
# other operations

or reassign:

df = df.sort_values(...)
# other operations
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.