0

I created a dataframe using groupby and pd.cut to calculate the mean, std and number of elements inside a bin. I used the agg()and this is the command I used:

df_bin=df.groupby(pd.cut(df.In_X, ranges,include_lowest=True)).agg(['mean', 'std','size'])

df_bin looks like this:

                 X                  Y
                 mean   std size   mean         std  size
In_X                    
(10.424, 10.43] 10.425  NaN  1      0.003786    NaN   1
(10.43, 10.435] 10.4    NaN  0      NaN         NaN   0

I want to create an array with the values of the mean for the first header X. If I didn't have the two header level, I would use something like:

mean=np.array(df_bin['mean'])

But how to do that with the two headers?

2 Answers 2

2

This documentation would serve you well: https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html

To answer your question, if you just want a particular column:

mean = np.array(df_bin['X', 'mean'])

But if you wanted to slice to the second level:

mean = np.array(df_bin.loc[:, (slice(None), 'mean')])

Or:

mean = np.array(df_bin.loc[:, pd.IndexSlice[:, 'mean']])
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the documentation. Your solution takes the mean values from both X and Y main headers
Oh I misunderstood. If you just want the 'X' then mean = np.array(df_bin['X', 'mean']) would work already.
Worked great. Perhaps, I should open a new question. But how do know how to use dropna() to drop only the rows where the mean of X is NaN?
You could always apply dropna() directly df_bin['X', 'mean'] before you pass into array. mean = df_bin['X', 'mean'].dropna().values
1

We can do

df_bin.stack(level=0)['mean'].values

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.