0

I'm looking for a solution to merge multiples JSONL files from one folder using a Python script. Something like the script below that works for JSON files.

import json
import glob

result = []
for f in glob.glob("*.json"):
    with jsonlines.open(f) as infile:
        result.append(json.load(infile))

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

Please find below a sample of my JSONL file(only one line) :

{"date":"2021-01-02T08:40:11.378000000Z","partitionId":"0","sequenceNumber":"4636458","offset":"1327163410568","iotHubDate":"2021-01-02T08:40:11.258000000Z","iotDeviceId":"text","iotMsg":{"header":{"deviceTokenJwt":"text","msgType":"text","msgOffset":3848,"msgKey":"text","msgCreation":"2021-01-02T09:40:03.961+01:00","appName":"text","appVersion":"text","customerType":"text","customerGroup":"Customer"},"msgData":{"serialNumber":"text","machineComponentTypeId":"text","applicationVersion":"3.1.4","bootloaderVersion":"text","firstConnectionDate":"2018-02-20T10:34:47+01:00","lastConnectionDate":"2020-12-31T12:05:04.113+01:00","counters":[{"type":"DurationCounter","id":"text","value":"text"},{"type":"DurationCounter","id":"text","value":"text"},{"type":"DurationCounter","id":"text","value":"text"},{"type":"IntegerCounter","id":"text","value":2423},{"type":"IntegerCounter","id":"text","value":9914},{"type":"DurationCounter","id":"text","value":"text"},{"type":"IntegerCounter","id":"text","value":976},{"type":"DurationCounter","id":"text","value":"PT0S"},{"type":"IntegerCounter","id":"text","value":28},{"type":"DurationCounter","id":"text","value":"PT0S"},{"type":"DurationCounter","id":"text","value":"PT0S"},{"type":"DurationCounter","id":"text","value":"text"},{"type":"IntegerCounter","id":"text","value":1}],"defects":[{"description":"ProtocolDb.ProtocolIdNotFound","defectLevelId":"Warning","occurrence":3},{"description":"BridgeBus.CrcError","defectLevelId":"Warning","occurrence":1},{"description":"BridgeBus.Disconnected","defectLevelId":"Warning","occurrence":6}],"maintenanceEvents":[{"interventionId":"Other","comment":"text","appearance_display":0,"intervention_date":"2018-11-29T09:52:16.726+01:00","intervention_counterValue":"text","intervention_workerName":"text"},{"interventionId":"Other","comment":"text","appearance_display":0,"intervention_date":"2019-06-04T15:30:15.954+02:00","intervention_counterValue":"text","intervention_workerName":"text"}]}}}

Does anyone know how can I handle loading this?

0

2 Answers 2

2

Since each line in a JSONL file is a complete JSON object, you don't actually need to parse the JSONL files at all in order to merge them into another JSONL file. Instead, merge them by simply concatenating them. However, the caveat here is that the JSONL format does not mandate a newline character at the end of file. You would therefore have to read each line into a buffer to test if a JSONL file ends without a newline character, in which case you would have to explicitly output a newline character in order to separate the first record of the next file:

with open("merged_file.json", "w") as outfile:
    for filename in glob.glob("*.json"):
        with open(filename) as infile:
            for line in infile:
                outfile.write(line)
            if not line.endswith('\n'):
                outfile.write('\n')
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks @blhsing for your answer, but unfortunately I don't get the correct data(missing rows)!
Please update the question with a sample of your input files then.
I like this answer but if any of the jsonl files lack a newline at the end, you'll get missing rows. You likely need to process line by line to see if there is a final newline there.
@tdelaney Ahh thanks. I did not realize that JSONL allows a file to end without a newline character. Updated the answer accordingly then.
@Arvind Unless I misunderstood the OP's question I think the OP means to merge the JSONL files into another JSONL file, rather than into a JSON file.
|
1

You can update a main dict with every json object you load. Like

import json
import glob

result = {}
for f in glob.glob("*.json"):
    with jsonlines.open(f) as infile:
        result.update(json.load(infile)) #merge the dicts

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

But this will overwite similar keys.!

1 Comment

When I run your code, I get this Error : EOFError

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.