1

I have four 2d numpy arrays:

import numpy as np
import pandas as pd    
x1 = np.array([[2, 4, 1],
       [2, 2, 1],
       [1, 3, 3],
       [2, 2, 1],
       [3, 3, 2]])
x2 = np.array([[1, 2, 2],
       [4, 1, 4],
       [1, 4, 4],
       [3, 3, 2],
       [2, 2, 4]])

x3 = np.array([[4, 3, 2],
       [4, 3, 2],
       [4, 3, 3],
       [1, 2, 2],
       [1, 4, 3]])      
x4 = np.array([[3, 1, 1],
       [3, 4, 3],
       [2, 2, 1],
       [2, 1, 1],
       [1, 2, 4]])

And I would like to create a dataframe as following:

level_1_label = ['location1','location2','location3']
level_2_label = ['x1','x2','x3','x4']
header = pd.MultiIndex.from_product([level_1_label, level_2_label], names=['Location','Variable'])
df = pd.DataFrame(np.concatenate((x1,x1,x3,x4),axis=1), columns=header)
df.index.name = 'Time'
df

enter image description here

Data in this DataFrame is not in the desired form.

I want the four columns (x1,x2,x3,x4) in the first level column label (location1) should be created by taking the first columns from all the numpy arrays. The next four columns (x1,x2,x3,x4) ie. the four columns in the second first level column label (location2) should be created by taking second columns from all four numpy arrays and so on. The length of first level column label ie. len(level_1_label) will be equal to the number of columns in all four 2d numpy arrays.

Desired DataFrame:

enter image description here

2 Answers 2

2

One option is to reverse the order in creating the MultiIndex column (since level_1_label corresponds to the columns and level_2_label corresponds to the arrays); then swaplevel + sort_index (to get it in the desired order) after building the DataFrame:

level_1_label = ['location1','location2','location3']
level_2_label = ['x1','x2','x3','x4']
header = pd.MultiIndex.from_product([level_2_label, level_1_label], names=['Variable','Location'])
df = pd.DataFrame(np.concatenate((x1,x2,x3,x4),axis=1), columns=header).swaplevel(axis=1).sort_index(level=0, axis=1)
df.index.name = 'Time'

Output:

Location    location1          location2          location3         
Variable    x1 x2 x3 x4        x1 x2 x3 x4        x1 x2 x3 x4
Time                                                          
0            2  1  4  3         4  2  3  1         1  2  2  1
1            2  4  4  3         2  1  3  4         1  4  2  3
2            1  1  4  2         3  4  3  2         3  4  3  1
3            2  3  1  2         2  3  2  1         1  2  2  1
4            3  2  1  1         3  2  4  2         2  4  3  4
Sign up to request clarification or add additional context in comments.

Comments

1

One option is to reshape the data in Fortran order, before creating the dataframe:


# reusing your code
level_1_label = ['location1','location2','location3']
level_2_label = ['x1','x2','x3','x4']
header = pd.MultiIndex.from_product([level_1_label, level_2_label], names=['Location','Variable'])

# np.vstack is just a convenience wrapper around np.concatenate, axis=1
outcome = np.reshape(np.vstack([x1,x2,x3,x4]), (len(x1), -1), order = 'F')
df = pd.DataFrame(outcome, columns = header)
df.index.name = 'Time'

df

Location location1          location2          location3
Variable        x1 x2 x3 x4        x1 x2 x3 x4        x1 x2 x3 x4
Time
0                2  1  4  3         4  2  3  1         1  2  2  1
1                2  4  4  3         2  1  3  4         1  4  2  3
2                1  1  4  2         3  4  3  2         3  4  3  1
3                2  3  1  2         2  3  2  1         1  2  2  1
4                3  2  1  1         3  2  4  2         2  4  3  4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.