Convert json to dataframe in python

Question

I have a json file (sample below). I am trying to create a Dataframe from this using python: JSON:

{"data": {
        "A": [{
                "CREATION_DATE": "1482105600",
                "SOURCE_COUNT": "0"
            },
            {
                "CREATION_DATE": "1482105600",
                "SOURCE_COUNT": "0"
            } ],
         "B": [{
                "CREATION_DATE": "1487808000",
                "SOURCE_COUNT": "1048"
                },
                {
                "CREATION_DATE": "1487894400",
                "SOURCE_COUNT": "1103"
                 } ]
         }
 }

When I am trying to convert it into a dataframe:

My Code:

import json
file = 'mysample.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)

Output:

    index      A                                                   B
0   data    [{'CREATION_DATE': '1482105600', 'SOURCE_COUNT...   [{'CREATION_DATE': '1487808000', 'SOURCE_COUNT...

Instead I am looking to convert this into a dataframe which looks like below:

system  CREATION_DATE   SOURCE_COUNT
A        1482105600        0
A        1482105600        0
B        1487808000        1048
B        1487894400        1103

How to modify my code to get to the expected output?

filippo · Accepted Answer · 2018-06-11 10:54:34Z

4

pd.DataFrame(dict_train['data']).stack().apply(pd.Series).reset_index(level=0, drop=True).sort_index()


  CREATION_DATE SOURCE_COUNT
A    1482105600            0
A    1482105600            0
B    1487808000         1048
B    1487894400         1103

answered Jun 11, 2018 at 10:54

filippo

5,3044 gold badges23 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

FHTMitchell Over a year ago

Not gonna lie, this answer is much cleaner and more efficient than mine. I'd have accepted this.

A. STEFANI Over a year ago

agree ! best method for sure

FHTMitchell · Accepted Answer · 2018-06-11 10:55:07Z

2

Here is an answer in pure python (nb. replace dict with collections.OrderedDict in python 3.5 or lower).

data = []
for system, values in dict_train['data'].items():
    for value in values:
        data.append(dict(system=system, **value))


df = pd.DataFrame.from_records(data)

output:

  CREATION_DATE SOURCE_COUNT system
0    1482105600            0      A
1    1482105600            0      A
2    1487808000         1048      B
3    1487894400         1103      B

answered Jun 11, 2018 at 10:55

FHTMitchell

12.2k2 gold badges40 silver badges50 bronze badges

Comments

A. STEFANI · Accepted Answer · 2018-06-11 11:02:41Z

2

This code works (but it is not using json):

import pandas as pd

current_dict = {"data": {
        "A": [{
                "CREATION_DATE": "1482105600",
                "SOURCE_COUNT": "0"
            },
            {
                "CREATION_DATE": "1482105600",
                "SOURCE_COUNT": "0"
            } ],
         "B": [{
                "CREATION_DATE": "1487808000",
                "SOURCE_COUNT": "1048"
                },
                {
                "CREATION_DATE": "1487894400",
                "SOURCE_COUNT": "1103"
                 } ]
         }
 }


my_list=[]

#iterate over your data: system
for system in current_dict["data"]:

    #iterate over data: system > sub-system
    for sub_system in current_dict["data"][system]:

        creation_date = int(sub_system["CREATION_DATE"])
        source_count = int(sub_system["SOURCE_COUNT"])

        #add to list
        my_list.append([system,creation_date,source_count])

#convert to panda df (adding colums name)   
df = pd.DataFrame(my_list,columns=("system","creation_date","source_count"))

print df

give:

  system  creation_date  source_count
0      A     1482105600             0
1      A     1482105600             0
2      B     1487808000          1048
3      B     1487894400          1103

answered Jun 11, 2018 at 11:02

A. STEFANI

6,7711 gold badge25 silver badges49 bronze badges

1 Comment

A. STEFANI Over a year ago

Maybe better to refer to @filippo answer which is the best (and cleanest) method !

Collectives™ on Stack Overflow

Convert json to dataframe in python

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related