11

Suppose I have a MultiIndex DataFrame similar to an example from the MultiIndex docs.

>>> df 
               0   1   2   3
first second                
bar   one      0   1   2   3
      two      4   5   6   7
baz   one      8   9  10  11
      two     12  13  14  15
foo   one     16  17  18  19
      two     20  21  22  23
qux   one     24  25  26  27
      two     28  29  30  31

I want to generate a NumPy array from this DataFrame with a 3-dimensional structure like

>>> desired_arr
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]],

       [[16, 20],
        [17, 21],
        [18, 22],
        [19, 23]],

       [[24, 28],
        [25, 29],
        [26, 30],
        [27, 31]]])

How can I do so?

Hopefully it is clear what is happening here - I am effectively unstacking the DataFrame by the first level and then trying to turn each top level in the resulting column MultiIndex to its own 2-dimensional array.

I can get half way there with

>>> df.unstack(1)
         0       1       2       3    
second one two one two one two one two
first                                 
bar      0   4   1   5   2   6   3   7
baz      8  12   9  13  10  14  11  15
foo     16  20  17  21  18  22  19  23
qux     24  28  25  29  26  30  27  31

but then I am struggling to find a nice way to turn each column into a 2-dimensional array and then join them together, beyond doing so explicitly with loops and lists.

I feel like there should be some way for me to specify the shape of my desired NumPy array beforehand, fill it with np.nan and then use a specific iterating order to fill the values with my DataFrame, but I have not managed to solve the problem with this approach yet .


To generate the sample DataFrame

iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
ind = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.arange(8*4).reshape((8, 4)), index=ind)

2 Answers 2

11

Some reshape and swapaxes magic -

df.values.reshape(4,2,-1).swapaxes(1,2)

Generalizable to -

m,n = len(df.index.levels[0]), len(df.index.levels[1])
arr = df.values.reshape(m,n,-1).swapaxes(1,2)

Basically splitting the first axis into two of lengths 4 and 2 creating a 3D array and then swapping the last two axes, i.e. pushing in the axis of length 2 to the back (as the last one).

Sample output -

In [35]: df.values.reshape(4,2,-1).swapaxes(1,2)
Out[35]: 
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]],

       [[16, 20],
        [17, 21],
        [18, 22],
        [19, 23]],

       [[24, 28],
        [25, 29],
        [26, 30],
        [27, 31]]])
Sign up to request clarification or add additional context in comments.

5 Comments

This is perfect, and easily generalizable if using the length of the levels instead of hardcoding. Thanks Divakar!
@EricHansen Added a generalized version.
Awesome. I'd be happy to ask this in another question, or award a bounty, but out of curiosity and to try and understand your method, I tried to slightly modify my problem. That is, have [[[0, 4], [8, 12], [16, 20], ..... instead - form the 3d array by grouping by the second multi-index level then moving down instead of across. But I couldn't adapt your solution easily - do you think this is a trivial change to the problem or requires a different soln. entirely?
@EricHansen I am guessing instead of .swapaxes(1,2), we need .transpose(2,0,1).
Note, that this only works if you have a full combination of first * second multiindex, and they are sorted in the proper order, which happens to be the case in the example.
1

to complete the answer of @divakar, for a multidimensionnal generalisation :

# sort values by index
A = df.sort_index()

# fill na  
for idx in A.index.names:  
  A = A.unstack(idx).fillna(0).stack(1)

# create a tuple with the rights dimensions
reshape_size = tuple([len(x) for x in A.index.levels])

# reshape
arr = np.reshape(A.values, reshape_size ).swapaxes(0,1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.