6

I have a 3 dimensional numpy array, (z, x, y). z is a time dimension and x and y are coordinates.

I want to convert this to a multiindexed pandas.DataFrame. I want the row index to be the z dimension and each column to have values from a unique x, y coordinate (and so, each column would be multi-indexed).

The simplest case (not multi-indexed):

>>> array.shape
(500L, 120L, 100L)

>>> df = pd.DataFrame(array[:,0,0])

>>> df.shape
(500, 1)

I've been trying to pass the whole array into a multiindex dataframe using pd.MultiIndex.from_arrays but I'm getting an error: NotImplementedError: > 1 ndim Categorical are not supported at this time

Looks like it should be fairly simple but I cant figure it out.

2 Answers 2

10

I find that a Series with a Multiindex is the most analagous pandas datatype for a numpy array with arbitrarily many dimensions (presumably 3 or more).

Here is some example code:

import pandas as pd
import numpy as np

time_vals = np.linspace(1, 50, 50)
x_vals = np.linspace(-5, 6, 12)
y_vals = np.linspace(-4, 5, 10)

measurements = np.random.rand(50,12,10)

#setup multiindex
mi = pd.MultiIndex.from_product([time_vals, x_vals, y_vals], names=['time', 'x', 'y'])

#connect multiindex to data and save as multiindexed Series
sr_multi = pd.Series(index=mi, data=measurements.flatten())

#pull out a dataframe of x, y at time=22
sr_multi.xs(22, level='time').unstack(level=0)

#pull out a dataframe of y, time at x=3
sr_multi.xs(3, level='x').unstack(level=1)
Sign up to request clarification or add additional context in comments.

1 Comment

Great answer to commonly asked question(s) about wrangling 3D numpy arrays into pandas. Much easier to understand than others I have seen. Bravo @Selah !
4

I think you can use panel - and then for Multiindex DataFrame add to_frame:

np.random.seed(10)
arr = np.random.randint(10, size=(5,3,2))
print (arr)
[[[9 4]
  [0 1]
  [9 0]]

 [[1 8]
  [9 0]
  [8 6]]

 [[4 3]
  [0 4]
  [6 8]]

 [[1 8]
  [4 1]
  [3 6]]

 [[5 3]
  [9 6]
  [9 1]]]

df = pd.Panel(arr).to_frame()
print (df)
             0  1  2  3  4
major minor               
0     0      9  1  4  1  5
      1      4  8  3  8  3
1     0      0  9  0  4  9
      1      1  0  4  1  6
2     0      9  8  6  3  9
      1      0  6  8  6  1

Also transpose can be useful:

df = pd.Panel(arr).transpose(1,2,0).to_frame()
print (df)
             0  1  2
major minor         
0     0      9  0  9
      1      1  9  8
      2      4  0  6
      3      1  4  3
      4      5  9  9
1     0      4  1  0
      1      8  0  6
      2      3  4  8
      3      8  1  6
      4      3  6  1

Another possible solution with concat:

arr = arr.transpose(1,2,0)
df = pd.concat([pd.DataFrame(x) for x in arr], keys=np.arange(arr.shape[2]))
print (df)
    0  1  2  3  4
0 0  9  1  4  1  5
  1  4  8  3  8  3
1 0  0  9  0  4  9
  1  1  0  4  1  6
2 0  9  8  6  3  9
  1  0  6  8  6  1

np.random.seed(10)
arr = np.random.randint(10, size=(500,120,100))
df = pd.Panel(arr).transpose(2,0,1).to_frame()
print (df.shape)
(60000, 100)

print (df.index.max())
(499, 119)

8 Comments

Thanks! This is getting close. But the shape of the data is not right, I'm looking for 500 rows (as "major") and 0 and 1 as minor as you have in your initial example. But I'm getting 500 columns instead. I've tried different permutations of transpose but still not quite right.
Do you need 500 rows in major, 120 or 100 in minor and 100 or 120 columns?
maybe need .transpose(1,0,2) if 120 columns.
Yes, I'm looking for 500 rows in major, 120 in minor and 100 columns. .transpose(1,0,2) doesn't do the trick.
so need .transpose(2,0,1)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.