How to transform a 3d arrays into a dataframe in python

Question

I have a 3d array as follows:

    ThreeD_Arrays = np.random.randint(0, 1000, (5, 4, 3))
    
    array([[[715, 226, 632],
            [305,  97, 534],
            [ 88, 592, 902],
            [172, 932, 263]],

           [[895, 837, 431],
            [649, 717,  39],
            [363, 121, 274],
            [334, 359, 816]],

           [[520, 692, 230],
            [452, 816, 887],
            [688, 509, 770],
            [290, 856, 584]],

           [[286, 358, 462],
            [831,  26, 332],
            [424, 178, 642],
            [955,  42, 938]], 

           [[ 44, 119, 757],
            [908, 937, 728],
            [809,  28, 442],
            [832, 220, 348]]])

Now I would like to have it into a DataFrame like this:

Add a Date column like indicated and the column names A, B, C.

How to do this transformation? Thanks!

ajwood · Accepted Answer · 2017-09-10 14:53:23Z

8

Based on the answer to this question, we can use a MultiIndex. First, create the MultiIndex and a flattened DataFrame.

A = np.random.randint(0, 1000, (5, 4, 3))

names = ['x', 'y', 'z']
index = pd.MultiIndex.from_product([range(s)for s in A.shape], names=names)
df = pd.DataFrame({'A': A.flatten()}, index=index)['A']

Now we can reshape it however we like:

df = df.unstack(level='x').swaplevel().sort_index()
df.columns = ['A', 'B', 'C']
df.index.names = ['DATE', 'i']

This is the result:

          A    B    C
DATE i           
0    0  715  226  632
     1  895  837  431
     2  520  692  230
     3  286  358  462
     4   44  119  757
1    0  305   97  534
     1  649  717   39
     2  452  816  887
     3  831   26  332
     4  908  937  728
2    0   88  592  902
     1  363  121  274
     2  688  509  770
     3  424  178  642
     4  809   28  442
3    0  172  932  263
     1  334  359  816
     2  290  856  584
     3  955   42  938
     4  832  220  348

edited Sep 10, 2017 at 14:53

answered Sep 10, 2017 at 14:45

ajwood

19.2k16 gold badges65 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

rwolst Over a year ago

The first part of this answer is the nicest way I believe to convert a multidimensional Numpy array to a Pandas dataframe.

Evan Rosica Over a year ago

The pandas panel has been depreciated, so this should now be the accepted answer.

bers Over a year ago

Why do you need pd.DataFrame({'A': A.flatten()}, index=index)['A'] instead of simply pd.DataFrame(A.flatten(), index=index)?

unutbu · Accepted Answer · 2018-08-17 11:39:06Z

4

You could convert your 3D array to a Pandas Panel, then flatten it to a 2D DataFrame (using .to_frame()):

import numpy as np
import pandas as pd
np.random.seed(2016)

arr = np.random.randint(0, 1000, (5, 4, 3))
pan = pd.Panel(arr)
df = pan.swapaxes(0, 2).to_frame()
df.index = df.index.droplevel('minor')
df.index.name = 'Date'
df.index = df.index+1
df.columns = list('ABC')

yields

        A    B    C
Date               
1     875  702  266
1     940  180  971
1     254  649  353
1     824  677  745
...
4     675  488  939
4     382  238  225
4     923  926  633
4     664  639  616
4     770  274  378

Alternatively, you could reshape the array to shape (20, 3), form the DataFrame as usual, and then fix the index:

import numpy as np
import pandas as pd
np.random.seed(2016)

arr = np.random.randint(0, 1000, (5, 4, 3))
df = pd.DataFrame(arr.reshape(-1, 3), columns=list('ABC'))
df.index = np.repeat(np.arange(arr.shape[0]), arr.shape[1]) + 1
df.index.name = 'Date'
print(df)

yields the same result.

edited Aug 17, 2018 at 11:39

answered Feb 20, 2016 at 15:32

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

1 Comment

sty1882 Over a year ago

Many thanks! unutbu! it works! however I would like to plot the data in a seaborn like disgram, and need to have column name more complicated than 'ABC', say it should be "Abc", "Def", "Ghi" how it works? And the seaborn like disgram I like to have is impletemented as follows: df = pd.melt(df, id_vars=["date"], var_name="condition") ax = df.groupby(["condition", "date"]).mean().unstack("condition").plot() x = np.arange(1990,1993) palette = sns.color_palette() Sources: stackoverflow.com/questions/22795348/…

Jarad · Accepted Answer · 2016-02-21 07:14:05Z

3

ThreeD_Arrays = np.random.randint(0, 1000, (5, 4, 3))
df = pd.DataFrame([list(l) for l in ThreeD_Arrays]).stack().apply(pd.Series).reset_index(1, drop=True)
df.index.name = 'Date'
df.columns = list('ABC')

answered Feb 21, 2016 at 7:14

Jarad

19.2k20 gold badges105 silver badges165 bronze badges

1 Comment

sty1882 Over a year ago

Thanks @Jarad! my new question is how to make the resulted df from your code to fit the following code: df = pd.melt(df, id_vars=["date"], var_name="condition") ax = df.groupby(["condition", "date"]).mean().unstack("condition").plot() and how to replace list('ABC') with column name "AAA", "BBB", "CCC"?

Collectives™ on Stack Overflow

How to transform a 3d arrays into a dataframe in python

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related