5

I want to put the following data into pandas for further analysis.

import numpy as np
import pandas as pd
from pandas import DataFrame

data = np.array([[[1, 1, 1, np.nan, 1], [np.nan, 1, 1, 1, 1]],
                 [[2, np.nan, 2, 2, 2], [2, np.nan, 2, 2, 2]],
                 [[3, 3, 3, np.nan, 3], [3, 3, 3, 3, np.nan]]])

pnda = pd.Series(data)

print pnda

But the following error occurs:

Exception: Data must be 1-dimensional

What is the good way of doing it? My further analysis is to filling the np.nan values by interpolation with cubic or polynomial method and output the result as numpy array.

12
  • Use a DataFrame for multidimentional data, not a Series. Commented May 2, 2014 at 16:23
  • @Ffisegydd it seems that DataFrame only accepts 2-D arrays... Commented May 2, 2014 at 16:24
  • @neha do you really need to pass a 3-D array to Pandas? Commented May 2, 2014 at 16:26
  • Ah yes sorry. I think DataFrames are the way to go rather than Series (which is typically a 1D "array" I think) but I don't know how to handle ND arrays greater than N. Commented May 2, 2014 at 16:26
  • @SaulloCastro yes, because in pandas there is builtin function for interpolating the missing data Commented May 2, 2014 at 16:26

2 Answers 2

3

Try using a panel:

import numpy as np
import pandas as pd

data = np.array([[[1, 1, 1, np.nan, 1], [np.nan, 1, 1, 1, 1]],
                 [[2, np.nan, 2, 2, 2], [2, np.nan, 2, 2, 2]],
                 [[3, 3, 3, np.nan, 3], [3, 3, 3, 3, np.nan]]])

x = pd.Panel(data)
x

<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 2 (major_axis) x 5 (minor_axis)
Items axis: 0 to 2
Major_axis axis: 0 to 1
Minor_axis axis: 0 to 4

And...

print(x.loc[0])
    0  1  2   3  4
0   1  1  1 NaN  1
1 NaN  1  1   1  1
Sign up to request clarification or add additional context in comments.

2 Comments

Not that interpolate isn't implemented on Panels iirc, which was the OP's original need. You'll probably need to iterate over the items of the panel.
@TomAugspurger or maybe a nested apply 0_o
2

Based on your comments you can achieve what you want if you reshape data, interpolate using the DataFrame.interpolate() method and then return the array to its original value. It works for pandas 0.13.1.

df = pd.DataFrame(data.reshape(2, -1))
df.interpolate(axis=1).values.reshape(data.shape)
#array([[[1, 1, 1, 1, 1],
#        [1, 1, 1, 1, 1]],
#
#       [[2, 2, 2, 2, 2],
#        [2, 2, 2, 2, 2]],
#
#       [[3, 3, 3, 3, 3],
#        [3, 3, 3, 3, 3]]], dtype=int64)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.