create valid json object in python

Question

Each line is valid JSON, but I need the file as a whole to be valid JSON.

I have some data which is aggregated from a web service and dumped to a file, so it's JSON-eaque, but not valid JSON, so it can't be processed in the simple and intuitive way that JSON files can - thereby consituting a major pain in the neck, it looks (more or less) like this:

{"record":"value0","block":"0x79"} 
{"record":"value1","block":"0x80"}

I've been trying to reinterpret it as valid JSON, my latest attempt looks like this:

with open('toy.json') as inpt:
    lines = []
    for line in inpt:
        if line.startswith('{'):  # block starts
            lines.append(line)

However, as you can likely deduce by the fact that I'm posing this question- that doesn't work- any ideas about how I might tackle this problem?

EDIT:

Tried this:

with open('toy_two.json', 'rb') as inpt:

    lines = [json.loads(line) for line in inpt] 

print(lines['record'])

but got the following error:

Traceback (most recent call last):
  File "json-ifier.py", line 38, in <module>
    print(lines['record'])
TypeError: list indices must be integers, not str

Ideally I'd like to interact with it as I can with normal JSON, i.e. data['value']

EDIT II

with open('transactions000000000029.json', 'rb') as inpt:

    lines = [json.loads(line) for line in inpt]

    for line in lines: 
        records = [item['hash'] for item in lines]
    for item in records: 
        print item

Is each line valid JSON? eg: does lines = [json.loads(line) for line in inpt] do the job? — Jon Clements
– Jon Clements, Commented Sep 16, 2017 at 17:00
yes but I don't want to process each line- I want to process the file as a whole- the real one has millions of records — smatthewenglish
– smatthewenglish, Commented Sep 16, 2017 at 17:04
In what way does [json.loads(line) for line in inpt] not constitute "processing the file as a whole" ? — Chris Martin
– Chris Martin, Commented Sep 16, 2017 at 17:08
I'm quite confused now. If this file were valid JSON, it would be a list, right? What type do you want to interpret it as? — Chris Martin
– Chris Martin, Commented Sep 16, 2017 at 17:10

roganjosh · Accepted Answer · 2017-09-16 18:01:46Z

2

This looks like NDJSON that I've been working with recently. The specification is here and I'm not sure of its usefulness. Does the following work?

with open('the file.json', 'rb') as infile:
    data = infile.readlines()
    data = [json.loads(item.replace('\n', '')) for item in data]

This should give you a list of dictionaries.

edited Sep 16, 2017 at 18:01

answered Sep 16, 2017 at 17:08

roganjosh

13.3k4 gold badges33 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

smatthewenglish Over a year ago

when I tried it out just now I got this error print(data['record']) TypeError: list indices must be integers, not str, how can I verify that this works?

roganjosh Over a year ago

Because this parses the file and gives you a list of dictionaries, not a dictionary.

smatthewenglish Over a year ago

but I want to interact with it like I can with json, in normal json I can call things like data['record'] you know what I mean?

smatthewenglish Over a year ago

damn- I'm sorry it was exactly the data[0]['record']- thank you for your great help!~ :)

roganjosh Over a year ago

@s.matthew.english it's still a list, so items() is out. records = [item['record'] for item in data] should do it? I guess the point of the format is that every line is valid json, but the file as a whole is not. I find this a bit uncomfortable too, but you do just have a list of dictionaries so if you know how to iterate through lists and grab things by key, it's not that bad.

|

Stephane Martin · Accepted Answer · 2017-09-16 17:23:36Z

2

Each line looks like a valid JSON document.

That's "JSON Lines" format (http://jsonlines.org/)

Try to process each line independantly (json.loads(line)) or use a specialized library (https://jsonlines.readthedocs.io/en/latest/).

def process(oneline):
    # do what you want with each line
    print(oneline['record'])

with open('toy_two.json', 'rb') as inpt:
    for line in inpt:
        process(json.loads(line))

edited Sep 16, 2017 at 17:23

answered Sep 16, 2017 at 17:03

Stephane Martin

1,6521 gold badge19 silver badges26 bronze badges

5 Comments

smatthewenglish Over a year ago

I'd like to process the file as a whole- as the real one has millions of records

Stephane Martin Over a year ago

So ? You can just iterate on each line of the input file as you do in your code, and apply json.loads(line) inside the 'for' loop.

smatthewenglish Over a year ago

sounds expensive, I want to do it cheap and fast

Stephane Martin Over a year ago

If you store all parsed lines in a global list, yes this is going to be expensive in RAM. If you process each line independantly, then you only use a bit of memory for the current line. That's "flow based programming".

smatthewenglish Over a year ago

ok cool- it was just the data[0]['record'] issue- anyway- thank you for these great insights!

Collectives™ on Stack Overflow

create valid json object in python

2 Answers 2

13 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

13 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related