6

I have imported data into dataframe that looks like this

           VMGI US Equity   VMGI Open   VMGI High  VMGI Low  VMGI Px_last   VMGI Volume SPOM US Equity  SPOM Open   SPOM High   SPOM Low   SPOM Px_last SPOM Volume
Date
12/31/2012  12/31/2012      0.009       0.011      0.009         0.009      105726      12/31/2012      0.4575      0.4575      0.2925      0.3975       8890
1/1/2013    1/1/2013        0.009       0.011      0.009         0.009      105726      1/1/2013        0.4575      0.4575     0.2925      0.3975      8890
1/2/2013    1/2/2013        0.009       0.01       0.008         0.01       188150      1/2/2013        0.3975      0.3975     0.3225      0.3225      3400
1/3/2013    1/3/2013        0.011       0.018      0.011         0.015       169890     1/3/2013        0.34        0.3738     0.28        0.29       48933
1/4/2013    1/4/2013        0.015       0.018      0.014         0.018       33500      1/4/2013        0.36        0.4        0.3175      0.3175      3610

Each 6th column is a new stock. The rows go on for 1340 rows. I want to re-organize in a multi-index (I think) to create data that looks like this because I wanted to add additional columns for each stock. I was able to get the stock names with the following code

index2 =index1[0::6]     >>> which results in an object as follows (the first column for each stock) 
Index(['VMGI US Equity', 'SPOM US Equity', 'OPTL US Equity', 'FRHV US Equity', etc....

Ultimately I want the dataframe to look like a index with each stock

VMGI US Equity  VMGI US Equity  VMGI Open   VMGI High   VMGI Low    VMGI Px_last    VMGI Volume
                 12/31/2012       0.009      0.011      0.009       0.009            105726
                 1/1/2013         0.009      0.011      0.009       0.009            105726
                 1/2/2013         0.009      0.01       0.008       0.01             188150
                 1/3/2013         0.011      0.018      0.011       0.015          169890
                 1/4/2013         0.015      0.018      0.014       0.018          33500
SPOM US Equity  SPOM US Equity  SPOM Open   SPOM High   SPOM Low    SPOM Px_last    SPOM Volume
                12/31/2012       0.4575     0.4575      0.2925      0.3975          8890
                1/1/2013         0.4575     0.4575      0.2925      0.3975          8890

I have tried set_index and got the follow error.

df2.index = df_clean_penny1.set_index(index2)
ValueError: Length mismatch: Expected axis has 1340 elements, new values have 65 elements

From other posts I have also tried MultiIndex.from_arrays() but also could not get it to work. any help/guidance is appreciated

1
  • Let me guess, Bloomberg? Commented Feb 20, 2018 at 0:21

1 Answer 1

7

You can use the str accessor for pd.Index objects and create a pd.MultiIndex with split and the expand=True argument

df.columns = df.columns.str.split(' ', 1, expand=True)

Then you can stack the first level of the column index you just created

df.stack(0)

                   High     Low    Open  Px_last   US Equity  Volume
Date                                                                
12/31/2012 SPOM  0.4575  0.2925  0.4575   0.3975  12/31/2012    8890
           VMGI  0.0110  0.0090  0.0090   0.0090  12/31/2012  105726
1/1/2013   SPOM  0.4575  0.2925  0.4575   0.3975    1/1/2013    8890
           VMGI  0.0110  0.0090  0.0090   0.0090    1/1/2013  105726
1/2/2013   SPOM  0.3975  0.3225  0.3975   0.3225    1/2/2013    3400
           VMGI  0.0100  0.0080  0.0090   0.0100    1/2/2013  188150
1/3/2013   SPOM  0.3738  0.2800  0.3400   0.2900    1/3/2013   48933
           VMGI  0.0180  0.0110  0.0110   0.0150    1/3/2013  169890
1/4/2013   SPOM  0.4000  0.3175  0.3600   0.3175    1/4/2013    3610
           VMGI  0.0180  0.0140  0.0150   0.0180    1/4/2013   33500

A variation of this without editing the columns object in place would be to use the set_axis method. pd.DataFrame.set_axis as of Pandas version 0.21 now accepts an inplace=False argument which allows for pipelining.

df.set_axis(df.columns.str.split(' ', 1, expand=True), 1, 0).stack(0)

                   High     Low    Open  Px_last   US Equity  Volume
Date                                                                
12/31/2012 SPOM  0.4575  0.2925  0.4575   0.3975  12/31/2012    8890
           VMGI  0.0110  0.0090  0.0090   0.0090  12/31/2012  105726
1/1/2013   SPOM  0.4575  0.2925  0.4575   0.3975    1/1/2013    8890
           VMGI  0.0110  0.0090  0.0090   0.0090    1/1/2013  105726
1/2/2013   SPOM  0.3975  0.3225  0.3975   0.3225    1/2/2013    3400
           VMGI  0.0100  0.0080  0.0090   0.0100    1/2/2013  188150
1/3/2013   SPOM  0.3738  0.2800  0.3400   0.2900    1/3/2013   48933
           VMGI  0.0180  0.0110  0.0110   0.0150    1/3/2013  169890
1/4/2013   SPOM  0.4000  0.3175  0.3600   0.3175    1/4/2013    3610
           VMGI  0.0180  0.0140  0.0150   0.0180    1/4/2013   33500

To take is one step further, we can swap the levels of the index and sort to improve the layout.

df.set_axis(df.columns.str.split(' ', 1, expand=True), 1, 0).stack(0) \
    .swaplevel(0, 1).sort_index().reindex(df.index, level=1)

                   High     Low    Open  Px_last   US Equity  Volume
     Date                                                           
SPOM 12/31/2012  0.4575  0.2925  0.4575   0.3975  12/31/2012    8890
     1/1/2013    0.4575  0.2925  0.4575   0.3975    1/1/2013    8890
     1/2/2013    0.3975  0.3225  0.3975   0.3225    1/2/2013    3400
     1/3/2013    0.3738  0.2800  0.3400   0.2900    1/3/2013   48933
     1/4/2013    0.4000  0.3175  0.3600   0.3175    1/4/2013    3610
VMGI 12/31/2012  0.0110  0.0090  0.0090   0.0090  12/31/2012  105726
     1/1/2013    0.0110  0.0090  0.0090   0.0090    1/1/2013  105726
     1/2/2013    0.0100  0.0080  0.0090   0.0100    1/2/2013  188150
     1/3/2013    0.0180  0.0110  0.0110   0.0150    1/3/2013  169890
     1/4/2013    0.0180  0.0140  0.0150   0.0180    1/4/2013   33500

Strictly speaking, that last bit with the reindex isn't exactly necessary. But it bothered me that I might be rearranging dates. So I put them back in place.

Sign up to request clarification or add additional context in comments.

7 Comments

This is so clean and elegant.
I will give it a try. ty. Yes Bloomberg (one of the non developers/engineers) as you can tell by the question
@piRSquared is there a way to ultimately end up with stock as first level in multi-index and then combine the full series of dates with each individual stock. So that I could somehow "call/index" a single individual stock out of DF?
@JWestwood I believe what I've been mucking about with does that. See my updates and let me know if that answers the question.
it will take me some time to figure out what you've done but it seems to work well. marked as correct. ty
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.