1

Regarding this df:

               Amount                          type                 
Month_year 2019-06-01     2019-07-01     2019-06-01    2019-07-01   
TYPE_ID             1   2          1   2          1  2          1  2
ID                                                                  
100                20  10         40  20          1  1          2  1
200                80  60         30  10          2  2          1  1

The following code:

df = df.sort_index(axis=1, level=[1,2])

produces this:

               Amount       type     Amount  ...       type     Amount       type
Month_year 2019-06-01 2019-06-01 2019-06-01  ... 2019-07-01 2019-07-01 2019-07-01
TYPE_ID             1          1          2  ...          1          2          2
ID                                           ...                                 
100                20          1         10  ...          2         20          1
200                80          2         60  ...          1         10          1

I really don't understand what's going on. I've read the docs but there are no examples and the description is really obscure.

Could anyone explain to me how this method works and how I received this result?

7
  • 1
    It is sortng the columns, first sort by level = 1 (Month year), after alphabetically by level 2 Commented Jan 23, 2020 at 20:00
  • @ansev thanks. And what about the axis? Why when axis=0 the dataframe remains the same? What exactly does it do? Commented Jan 24, 2020 at 12:16
  • axis = 0, sort by the index Commented Jan 24, 2020 at 12:17
  • @ansev I don't understand. Could you elaborate more please? Commented Jan 24, 2020 at 18:44
  • The index here is ID that goes 100 and 200, and the dataset seems already sorted by the index. Sorting an already sorted dataset doesn't change anything. Commented Jan 26, 2020 at 1:09

1 Answer 1

7
+100

Essentially,

sort_index with axis=1 sorts the column headers, and this ordering is then used to set the order of the columns.

And, the corollary,

sort_index with axis=0 sorts the index, and this ordering is then used to set the order of the rows.


This is what your input df looks like:

enter image description here

The top three "rows" in the picture above correspond to a pandas MultiIndex column of df, which looks like this:

df.columns
MultiIndex([('Amount', '2019-06-01', 1),
            ('Amount', '2019-06-01', 2),
            ('Amount', '2019-07-01', 1),
            ('Amount', '2019-07-01', 2),
            (  'type', '2019-06-01', 1),
            (  'type', '2019-06-01', 2),
            (  'type', '2019-07-01', 1),
            (  'type', '2019-07-01', 2)])

Let's pretend your 3-level multiIndex column is magically converted into a DataFrame which each level its own column called cdf:

cdf
    level_0     level_1  level_2
(1)  Amount  2019-06-01        1
(2)  Amount  2019-06-01        2
(3)  Amount  2019-07-01        1
(4)  Amount  2019-07-01        2
(5)    type  2019-06-01        1
(6)    type  2019-06-01        2
(7)    type  2019-07-01        1
(8)    type  2019-07-01        2

The row numbers here correspond to the column identifiers in the original DataFrame. Let's see what happens when we sort cdf by the last two columns:

cdf.sort_values(['level_1', 'level_2'])

    level_0     level_1  level_2
(1)  Amount  2019-06-01        1
(5)    type  2019-06-01        1
(2)  Amount  2019-06-01        2
(6)    type  2019-06-01        2
(3)  Amount  2019-07-01        1
(7)    type  2019-07-01        1
(4)  Amount  2019-07-01        2
(8)    type  2019-07-01        2

Notice the index of the sorted cdf:

(1) (5) (2) (6) (3) (7) (4) (8)

Now let's see what happens when we apply the sort_index operation to df:

df.sort_index(level=[1, 2], axis=1)

enter image description here

The ellipses in the center indicate that not all columns can be displayed due to the width of the terminal (in fact, columns (6) and (3) are not displayed but they are there very much there), but that's not the interesting part. Contrast the column ordering here with the row ordering of sorted cdf, you'll see these are are identical.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.