How to custom sort pandas multi-index?

Question

The following code generates the pandas table named out.

import pandas as pd 
import numpy as np

df = pd.DataFrame({'Book': ['B1', 'B1', 'B2', 'B3', 'B3', 'B3'], 
                   'Trader': ['T1', 'Z2', 'Z2', 'T1', 'U3', 'T2'], 
                   'Position':[10, 33, -34, 87, 43, 99]})
df = df[['Book', 'Trader', 'Position']]

table = pd.pivot_table(df, index=['Book', 'Trader'], values=['Position'], aggfunc=np.sum)

print(table)

tab_tots = table.groupby(level='Book').sum()
tab_tots.index = [tab_tots.index, ['Total'] * len(tab_tots)]
print(tab_tots)

out = pd.concat(
    [table, tab_tots]
).sort_index().append(
    table.sum().rename(('Grand', 'Total'))
)

The table out look like

But I would like it to look like

Notice how the second table always puts the 'Total' at the bottom. So basically I still want to sort alphabetically but I would like to always put 'Total' last. Could someone provide an adjustment to my code that gives my desired output?

Ted Petrou · Accepted Answer · 2017-01-11 20:42:26Z

Pandas has built-in functionality within the pivot_table function to compute the marginal totals.

table = pd.pivot_table(df, 
               index='Book', 
               columns='Trader', 
               values='Position', 
               aggfunc=np.sum, 
               margins=True, 
               margins_name='Total').drop('Total').stack()
table[('Grand', 'Total')] = table.sum()
table.name = 'Position'
table.reset_index()

     Book Trader  Position
0      B1     T1      10.0
1      B1     Z2      33.0
2      B1  Total      43.0
3      B2     Z2     -34.0
4      B2  Total     -34.0
5      B3     T1      87.0
6      B3     T2      99.0
7      B3     U3      43.0
8      B3  Total     229.0
13  Grand  Total     238.0

Solution based on sorting multi-index

This solution continues off from your analysis by starting from your out DataFrame. You can convert Book and Trader to Pandas categorical type which allows you to custom sort by passing in the argument ordered=True and a list of the categories in the order you want sorted.

out = out.reset_index()

trader_cats = pd.Categorical(out['Trader'], 
                   categories=sorted(df.Trader.unique()) + ['Total'], 
                   ordered=True)

book_cats = pd.Categorical(out['Book'], 
                   categories=sorted(df.Book.unique()) + ['Grand'], 
                   ordered=True)

out['Trader'] = trader_cats
out['Book'] = book_cats
out.set_index(['Book', 'Trader'], inplace=True)
out.sort_index(level=['Book', 'Trader'])

              Position
Book  Trader          
B1    T1            10
      Z2            33
      Total         43
B2    Z2           -34
      Total        -34
B3    T1            87
      T2            99
      U3            43
      Total        229
Grand Total        238

jezrael · Accepted Answer · 2017-01-11 20:24:55Z

You can use groupby with unstack for reshape. Then easy add new Total column, count Grand Total and stack. Last add new row by loc:

df1 = df.groupby(['Book','Trader']).Position.sum().unstack()
df1['Total'] = df1.sum(1)
all_sum = df1['Total'].sum()
df1 = df1.stack()
df1.loc[('Grand','Total')] = all_sum
df1 = df1.reset_index(name='Position')
print (df1)
    Book Trader  Position
0     B1     T1      10.0
1     B1     Z2      33.0
2     B1  Total      43.0
3     B2     Z2     -34.0
4     B2  Total     -34.0
5     B3     T1      87.0
6     B3     T2      99.0
7     B3     U3      43.0
8     B3  Total     229.0
9  Grand  Total     238.0

Comparing with another solution:

def jez(df):
    df1 = df.groupby(['Book','Trader']).Position.sum().unstack()
    df1['Total'] = df1.sum(1)
    all_sum = df1['Total'].sum()
    df1 = df1.stack()
    df1.loc[('Grand','Total')] = all_sum
    df1 = df1.reset_index(name='Position')
    return (df1)


def ted1(df):
    table = pd.pivot_table(df, 
                           index=['Book'], 
                           columns=['Trader'], 
                           values=['Position'], 
                           aggfunc=np.sum, 
                           margins=True, 
                           margins_name='total')
    return table.stack()\
                  .rename({'total':'Total'})\
                  .reset_index(1)\
                  .rename({'Total':'Grand'})\
                  .reset_index()\
                  .query('Book != "Grand" | Trader == "Total"')


print (jez(df))
print (ted1(df))

In [419]: %timeit (jez(df))
100 loops, best of 3: 5.65 ms per loop

In [420]: %timeit (ted1(df))
10 loops, best of 3: 26.5 ms per loop

Conclusion:

For subtotals is faster groupby+unstack solution, also easier sum of subtotals.

pivot_table for pivoting easier (one function), but more complicated for manipulation with subtotal + total rows.

Collectives™ on Stack Overflow

How to custom sort pandas multi-index?

2 Answers 2

Solution based on sorting multi-index

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Solution based on sorting multi-index

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related