1

I've the following dictionary:

dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
         'B': [[4, 4, 4], [4, 4, 4],],
         'C': [[4, 6, 0]]
        }

I want to convert it to a pd.DataFrame(), expecting this:

id       ColA        ColB        ColC
0         1           4           4
1         2           4           6
2         3           4           0
3         1           4           
4         2           4
5         3           4
6         1
7         2
8         3

How can I do that? I'm trying

pd.DataFrame(dictAll.items(), columns=['ColA', 'ColB', 'ColC'])

But it obviously doesn't work!

1
  • you can use pd.DataFrame.from_dict(dictA) but the arrays on the dict need to be the same length all. Commented Nov 17, 2020 at 13:37

1 Answer 1

3

Here is how:

import pandas as pd
import numpy as np

dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
         'B': [[4, 4, 4], [4, 4, 4],],
         'C': [[4, 6, 0]]}

df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
print(df)

Output:

   ColA ColB ColC
0     1    4    4
1     2    4    6
2     3    4    0
3     1    4     
4     2    4     
5     3    4     
6     1          
7     2          
8     3  

Now, let's have a look at the problem one step at a time.

  1. The first thing we might try is simply:

    df = pd.DataFrame(dictA)
    print(df)
    

    Which, of course, return this error:

     ValueError: arrays must all be same length
    
  2. So now we need a way to be able to create dataframes from a dict with arrays of different lengths. For that, we can:

    df = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in dictA.items()]))
    print(df)
    

    Output:

               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
  3. We want the dataframe to be vertical, so for each iteration, flatten out the lists with a list comprehension:

    df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()]))
    print(df)
    

    Output:

       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
  4. Now we want to replace all the NaNs with blanks. For that, we need to import numpy as np, and do:

    df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()])).replace(np.nan, '')
    print(df)
    

    Output:

       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
  5. Finally use formatted string to convert the letters into "Col" letters:

    df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
    print(df)
    

    Output:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.