3

The following code generates the pandas table named out.

import pandas as pd 
import numpy as np

df = pd.DataFrame({'Book': ['B1', 'B1', 'B2', 'B3', 'B3', 'B3'], 
                   'Trader': ['T1', 'Z2', 'Z2', 'T1', 'U3', 'T2'], 
                   'Position':[10, 33, -34, 87, 43, 99]})
df = df[['Book', 'Trader', 'Position']]

table = pd.pivot_table(df, index=['Book', 'Trader'], values=['Position'], aggfunc=np.sum)

print(table)

tab_tots = table.groupby(level='Book').sum()
tab_tots.index = [tab_tots.index, ['Total'] * len(tab_tots)]
print(tab_tots)

out = pd.concat(
    [table, tab_tots]
).sort_index().append(
    table.sum().rename(('Grand', 'Total'))
)

The table out look like this.

But I would like it to look like this.

Notice how the second table always puts the 'Total' at the bottom. So basically I still want to sort alphabetically but I would like to always put 'Total' last. Could someone provide an adjustment to my code that gives my desired output?

2 Answers 2

2

Pandas has built-in functionality within the pivot_table function to compute the marginal totals.

table = pd.pivot_table(df, 
               index='Book', 
               columns='Trader', 
               values='Position', 
               aggfunc=np.sum, 
               margins=True, 
               margins_name='Total').drop('Total').stack()
table[('Grand', 'Total')] = table.sum()
table.name = 'Position'
table.reset_index()

     Book Trader  Position
0      B1     T1      10.0
1      B1     Z2      33.0
2      B1  Total      43.0
3      B2     Z2     -34.0
4      B2  Total     -34.0
5      B3     T1      87.0
6      B3     T2      99.0
7      B3     U3      43.0
8      B3  Total     229.0
13  Grand  Total     238.0

Solution based on sorting multi-index

This solution continues off from your analysis by starting from your out DataFrame. You can convert Book and Trader to Pandas categorical type which allows you to custom sort by passing in the argument ordered=True and a list of the categories in the order you want sorted.

out = out.reset_index()

trader_cats = pd.Categorical(out['Trader'], 
                   categories=sorted(df.Trader.unique()) + ['Total'], 
                   ordered=True)

book_cats = pd.Categorical(out['Book'], 
                   categories=sorted(df.Book.unique()) + ['Grand'], 
                   ordered=True)

out['Trader'] = trader_cats
out['Book'] = book_cats
out.set_index(['Book', 'Trader'], inplace=True)
out.sort_index(level=['Book', 'Trader'])

              Position
Book  Trader          
B1    T1            10
      Z2            33
      Total         43
B2    Z2           -34
      Total        -34
B3    T1            87
      T2            99
      U3            43
      Total        229
Grand Total        238
Sign up to request clarification or add additional context in comments.

Comments

1

You can use groupby with unstack for reshape. Then easy add new Total column, count Grand Total and stack. Last add new row by loc:

df1 = df.groupby(['Book','Trader']).Position.sum().unstack()
df1['Total'] = df1.sum(1)
all_sum = df1['Total'].sum()
df1 = df1.stack()
df1.loc[('Grand','Total')] = all_sum
df1 = df1.reset_index(name='Position')
print (df1)
    Book Trader  Position
0     B1     T1      10.0
1     B1     Z2      33.0
2     B1  Total      43.0
3     B2     Z2     -34.0
4     B2  Total     -34.0
5     B3     T1      87.0
6     B3     T2      99.0
7     B3     U3      43.0
8     B3  Total     229.0
9  Grand  Total     238.0

Comparing with another solution:

def jez(df):
    df1 = df.groupby(['Book','Trader']).Position.sum().unstack()
    df1['Total'] = df1.sum(1)
    all_sum = df1['Total'].sum()
    df1 = df1.stack()
    df1.loc[('Grand','Total')] = all_sum
    df1 = df1.reset_index(name='Position')
    return (df1)


def ted1(df):
    table = pd.pivot_table(df, 
                           index=['Book'], 
                           columns=['Trader'], 
                           values=['Position'], 
                           aggfunc=np.sum, 
                           margins=True, 
                           margins_name='total')
    return table.stack()\
                  .rename({'total':'Total'})\
                  .reset_index(1)\
                  .rename({'Total':'Grand'})\
                  .reset_index()\
                  .query('Book != "Grand" | Trader == "Total"')


print (jez(df))
print (ted1(df))

In [419]: %timeit (jez(df))
100 loops, best of 3: 5.65 ms per loop

In [420]: %timeit (ted1(df))
10 loops, best of 3: 26.5 ms per loop

Conclusion:

For subtotals is faster groupby+unstack solution, also easier sum of subtotals.

pivot_table for pivoting easier (one function), but more complicated for manipulation with subtotal + total rows.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.