How do you sort data with multiindex (columns) dataframe?

Question

First, please forgive my bad vocabulary. Im still struggling with the correct terms, and have just discovered that I have created a multiindexed dataframe, which Im trying to learn how to manipulate.

The multiindex dataframe has 30 rows and 546 columns, and looks like a bigger version of this:

	A			B			C			D
	aa	bb	cc	aa	bb	cc	aa	bb	cc	aa	bb	cc
Date
2023-01-02	1	24	6	3	2	7	3	10	12	5	9	21
2023-01-03	1	23	7	3	4	6	3	9	13	6	10	22
2023-01-04	2	22	8	4	6	7	3	9	12	8	14	24
2023-01-05	3	21	10	3	8	6	4	8	11	10	12	21

The index is a timestamp date, and the top level (level 0?) column indexes A, B, C, D, etc each have the same 91 second level (level 1?) members: aa, bb, cc, etc

Since there are 546 columns in total, and 91 'level 1' columns, there must be 6 'level 0' columns. I cant see them cos the tables so big it just shows the first and last.

In reality, its a table of stock data pulled off yahoo where A, B, C are the (6) financial values like close, volume, high, etc and aa, bb, cc, etc are the (91) company codes.

Id like to learn how to do the following:

How to pull off a list of the 'level 0' column names.
How to pull off a list of the 'level 1' column names.
For 1 row (date), pull out the data for ALL 'level 0' and ONE 'level 1' index. (For example, all financial data for one company on one day).
For 1 row (date), ONE 'level 0' with ALL 'level 0' data. For example, volume data for all companies on one day.

Ive been trying things like:

df.loc[:,(['A','B'],['aa,'bb', 'cc'])]
df.loc['2023-01-02', :]

which work, but I cant sort the brackets and colons right to do the above stuff.

Also,

 df.loc[:,(['A','D'],['aa,'cc','ff'])]

and

df.loc['2023-01-05':,([A,C],[aa,dd])]

work, but

df.loc['2023-01-05',([A:],[aa,dd])]

and

df.loc['2023-01-05',(A:,[aa,dd])]

give invalid syntax. Can anyone explain, or maybe point me towards a tutorial that will help with the level definitions and round/square brackets and colons?

Thanks.

akuiper · Accepted Answer · 2023-01-08 19:55:04Z

3

To pull a list of level column names, you can use get_level_values:

df.columns.get_level_values(0)
#Index(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D'], dtype='object')

df.columns.get_level_values(1)
#Index(['aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc'], dtype='object')

df.columns.get_level_values(0).unique()
#Index(['A', 'B', 'C', 'D'], dtype='object')

df.columns.get_level_values(1).unique()
#Index(['aa', 'bb', 'cc'], dtype='object')

For 3 and 4, pd.IndexSlice would be convenient to use:

# all level zero data for a specific level one index
df.loc['2023-01-05', pd.IndexSlice[:, 'aa']]

#A  aa     3
#B  aa     3
#C  aa     4
#D  aa    10
#Name: 2023-01-05, dtype: int64

# all level one data for a specific level zero index

df.loc['2023-01-05', pd.IndexSlice['A', :]]
#A  aa     3
#   bb    21
#   cc    10
#Name: 2023-01-05, dtype: int64

answered Jan 8, 2023 at 19:55

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

AndysPythonStuff Over a year ago

So easy when you know how! Thanks, Psidom, works perfectly!

Collectives™ on Stack Overflow

How do you sort data with multiindex (columns) dataframe?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related