2

I am working with a JSON response that is formatted like a many-nested dictionary below:

{u'addresses': [],
 u'application_ids': [20855193],
 u'applications': [{u'answers': [{u'answer': u'Indeed ',
                                  u'question': u'How did you hear?'}],
                    u'applied_at': u'2015-10-29T22:19:04.925Z',
                    u'candidate_id': 9999999,
                    u'credited_to': None,
                    u'current_stage': {u'id': 9999999,
                                       u'name': u'Application Review'},
                    u'id': 9999999,
                    u'jobs': [{u'id': 9999999,u'name': u'ENGINEER'}],
                    u'last_activity_at': u'2015-10-29T22:19:04.767Z',
                    u'prospect': False,
                    u'rejected_at': None,
                    u'rejection_details': None,
                    u'rejection_reason': None,
                    u'source': {u'id': 7, u'public_name': u'Indeed'},
                    u'status': u'active'}],
 u'attachments': [{u'filename': u'Jason_Bourne.pdf',
                   u'type': u'resume',
                   u'url': u'https://resumeURL'}],
 u'company': None,
 u'coordinator': {u'employee_id': None,
                  u'id': 9999999,
                  u'name': u'Batman_Robin'},
 u'email_addresses': [{u'type': u'personal',
                       u'value': u'[email protected]'}],
 u'first_name': u'Jason',
 u'id': 9999999,
 u'last_activity': u'2015-10-29T22:19:04.767Z',
 u'last_name': u'Bourne',
 u'website_addresses': []}

I am trying to flatten the JSON into a table and have found the following example on the pandas documentation:

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html

For some reason, any data directly under the 'applications' header returns as one character per row. For example, if I call:

timeapplied = json_normalize(data,['applications', ['applied_at']])

I get:

 0
0   2
1   0
2   1
3   5
4   -
5   1
6   0
7   -
8   2
9   9
10  T
11  2
12  2
13  :
14  1
15  9
16  :
17  0
18  4
19  .
20  9
21  2
22  5
23  Z

Is there any way around this so I can use the normalize function?

Thanks!

1
  • What's your desired output? Commented Dec 30, 2016 at 0:06

1 Answer 1

1

Your call:

timeapplied = json_normalize(data,['applications', ['applied_at']])

A call to json_normalize consists of the parameters shown below,

pandas.io.json.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None)

You are passing ['applications', ['applied_at']] as the record_path. Apparently, this means that the data provided under data['applications]['applied_at'] is used as an array of records. In this case, the string is used as a list of characters. Hence, you obtain rows corresponding to each character.

To simply obtain all the data under the 'applications' header as a dataframe, use:

applied = json_normalize(data, 'applications')

To obtain applied_at as an individual column, then use:

applied_at = applied.applied_at

or

applied_at = applied['applied_at']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.