1

First, please forgive my bad vocabulary. Im still struggling with the correct terms, and have just discovered that I have created a multiindexed dataframe, which Im trying to learn how to manipulate.

The multiindex dataframe has 30 rows and 546 columns, and looks like a bigger version of this:

A B C D
aa bb cc aa bb cc aa bb cc aa bb cc
Date
2023-01-02 1 24 6 3 2 7 3 10 12 5 9 21
2023-01-03 1 23 7 3 4 6 3 9 13 6 10 22
2023-01-04 2 22 8 4 6 7 3 9 12 8 14 24
2023-01-05 3 21 10 3 8 6 4 8 11 10 12 21

The index is a timestamp date, and the top level (level 0?) column indexes A, B, C, D, etc each have the same 91 second level (level 1?) members: aa, bb, cc, etc

Since there are 546 columns in total, and 91 'level 1' columns, there must be 6 'level 0' columns. I cant see them cos the tables so big it just shows the first and last.

In reality, its a table of stock data pulled off yahoo where A, B, C are the (6) financial values like close, volume, high, etc and aa, bb, cc, etc are the (91) company codes.

Id like to learn how to do the following:

  1. How to pull off a list of the 'level 0' column names.

  2. How to pull off a list of the 'level 1' column names.

  3. For 1 row (date), pull out the data for ALL 'level 0' and ONE 'level 1' index. (For example, all financial data for one company on one day).

  4. For 1 row (date), ONE 'level 0' with ALL 'level 0' data. For example, volume data for all companies on one day.

Ive been trying things like:

df.loc[:,(['A','B'],['aa,'bb', 'cc'])]
df.loc['2023-01-02', :]

which work, but I cant sort the brackets and colons right to do the above stuff.

Also,

 df.loc[:,(['A','D'],['aa,'cc','ff'])]

and

df.loc['2023-01-05':,([A,C],[aa,dd])]

work, but

df.loc['2023-01-05',([A:],[aa,dd])]

and

df.loc['2023-01-05',(A:,[aa,dd])]

give invalid syntax. Can anyone explain, or maybe point me towards a tutorial that will help with the level definitions and round/square brackets and colons?

Thanks.

1 Answer 1

3

To pull a list of level column names, you can use get_level_values:

df.columns.get_level_values(0)
#Index(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D'], dtype='object')

df.columns.get_level_values(1)
#Index(['aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc', 'aa', 'bb', 'cc'], dtype='object')

df.columns.get_level_values(0).unique()
#Index(['A', 'B', 'C', 'D'], dtype='object')

df.columns.get_level_values(1).unique()
#Index(['aa', 'bb', 'cc'], dtype='object')

For 3 and 4, pd.IndexSlice would be convenient to use:

# all level zero data for a specific level one index
df.loc['2023-01-05', pd.IndexSlice[:, 'aa']]

#A  aa     3
#B  aa     3
#C  aa     4
#D  aa    10
#Name: 2023-01-05, dtype: int64

# all level one data for a specific level zero index

df.loc['2023-01-05', pd.IndexSlice['A', :]]
#A  aa     3
#   bb    21
#   cc    10
#Name: 2023-01-05, dtype: int64
Sign up to request clarification or add additional context in comments.

1 Comment

So easy when you know how! Thanks, Psidom, works perfectly!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.