Create pd.DataFrame from dictionary with multi-dimensional array

Question

I've the following dictionary:

dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
         'B': [[4, 4, 4], [4, 4, 4],],
         'C': [[4, 6, 0]]
        }

I want to convert it to a pd.DataFrame(), expecting this:

id       ColA        ColB        ColC
0         1           4           4
1         2           4           6
2         3           4           0
3         1           4           
4         2           4
5         3           4
6         1
7         2
8         3

How can I do that? I'm trying

pd.DataFrame(dictAll.items(), columns=['ColA', 'ColB', 'ColC'])

But it obviously doesn't work!

you can use pd.DataFrame.from_dict(dictA) but the arrays on the dict need to be the same length all. — Capie
– Capie, Commented Nov 17, 2020 at 13:37

Wai Ha Lee · Accepted Answer · 2020-11-17 18:46:00Z

Here is how:

import pandas as pd
import numpy as np

dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
         'B': [[4, 4, 4], [4, 4, 4],],
         'C': [[4, 6, 0]]}

df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
print(df)

Output:

   ColA ColB ColC
0     1    4    4
1     2    4    6
2     3    4    0
3     1    4     
4     2    4     
5     3    4     
6     1          
7     2          
8     3

Now, let's have a look at the problem one step at a time.

The first thing we might try is simply:

df = pd.DataFrame(dictA)
print(df)

Which, of course, return this error:

 ValueError: arrays must all be same length

So now we need a way to be able to create dataframes from a dict with arrays of different lengths. For that, we can:

df = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in dictA.items()]))
print(df)

Output:

           A          B          C
0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
1  [1, 2, 3]  [4, 4, 4]        NaN
2  [1, 2, 3]        NaN        NaN

We want the dataframe to be vertical, so for each iteration, flatten out the lists with a list comprehension:

df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()]))
print(df)

Output:

   A    B    C
0  1  4.0  4.0
1  2  4.0  6.0
2  3  4.0  0.0
3  1  4.0  NaN
4  2  4.0  NaN
5  3  4.0  NaN
6  1  NaN  NaN
7  2  NaN  NaN
8  3  NaN  NaN

Now we want to replace all the NaNs with blanks. For that, we need to import numpy as np, and do:

df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()])).replace(np.nan, '')
print(df)

Output:

Finally use formatted string to convert the letters into "Col" letters:

df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
print(df)

Output:

   ColA ColB ColC
0     1    4    4
1     2    4    6
2     3    4    0
3     1    4     
4     2    4     
5     3    4     
6     1          
7     2          
8     3

Collectives™ on Stack Overflow

Create pd.DataFrame from dictionary with multi-dimensional array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related