0

Here's my code, really simple stuff...

Here, I am trying to merge multiple json files into a single json file

import json
import glob

result = []
for f in glob.glob("*.json"):
with open(f, "rb") as infile:
    result.append(json.load(infile))

with open("merged_file.json", "wb") as outfile:
  json.dump(result, outfile)

file 1 is--

{"playlist_track.PlaylistId":1,"playlist_track.TrackId":3402}
{"playlist_track.PlaylistId":1,"playlist_track.TrackId":3389}

file 2 is--

{"playlist_track.PlaylistId":1,"playlist_track.TrackId":3402}
{"playlist_track.PlaylistId":1,"playlist_track.TrackId":3389}

they do have redundant records and 2 records per column. How do I merge these two files in one single json file.

I am getting the below error--

 JSONDecodeError: Extra data: line 2 column 1 (char 62)

with the traceback as--

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-2-d33a95f39988> in <module>
  5 for f in glob.glob("*.json"):
  6     with open(f, "rb") as infile:
----> 7         result.append(json.load(infile))
  8 
  9 with open("merged_file.json", "wb") as outfile:

~\anaconda3\lib\json\__init__.py in load(fp, cls, object_hook, parse_float, parse_int, 
parse_constant, object_pairs_hook, **kw)
294         cls=cls, object_hook=object_hook,
295         parse_float=parse_float, parse_int=parse_int,
--> 296         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
297 
298 

~\anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, 
parse_constant, object_pairs_hook, **kw)
346             parse_int is None and parse_float is None and
347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
349     if cls is None:
350         cls = JSONDecoder

~\anaconda3\lib\json\decoder.py in decode(self, s, _w)
338         end = _w(s, end).end()
339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
341         return obj

Please HELP :(

3
  • The file content provided is not a json format. It would need to be a list [...] separating each item {} with a comma or a dictionary with unique keys Commented Jan 18, 2021 at 1:07
  • is there any way to handle present file format? @NicLaforge Commented Jan 18, 2021 at 1:08
  • Depending on what you need you can read the file line by line. I'll provide you with a simple Commented Jan 18, 2021 at 1:36

1 Answer 1

1

The file format is incorrect. It should either be a list of dictionary or dictionary containing unique keys

If you cannot modify the file, you can read the content and append it to the result.

Read each file and append the result

result = ''
for f in glob.glob("*.json"):
    with open(f, "r") as infile:
        result += infile.read()

Then write the final result into another file

with open("merged_file.json", "w") as outfile:
    outfile.writelines(result)

Output:

{"playlist_track.PlaylistId":1,"playlist_track.TrackId":3402}
{"playlist_track.PlaylistId":1,"playlist_track.TrackId":3389}
{"playlist_track.PlaylistId":1,"playlist_track.TrackId":3402}
{"playlist_track.PlaylistId":1,"playlist_track.TrackId":3389}

With the above solution I would definitely attempt to change the file extension to .txt or something else that is not JSON.

My recommandation would be to convert the file into a JSON format and save it this way.

Read each line and convert it in to a dict. Result will contain a list of dict, which is JSON serializable

result = []
for f in glob.glob("*.json"):
    with open(f, "r") as infile:
        for line in infile.readlines():
            result.append(json.loads(line))

Once this is done you can now save the content as a JSON file

with open("merged_file.json", "w") as outfile:
  json.dump(result, outfile)

You may open the file as a JSON file now:

with open("merged_file.json", "r") as fp:
    print(pformat(json.load(fp)))

Output:

[{"playlist_track.PlaylistId": 1, "playlist_track.TrackId": 3402}, {"playlist_track.PlaylistId": 1, "playlist_track.TrackId": 3389}, {"playlist_track.PlaylistId": 1, "playlist_track.TrackId": 3402}, {"playlist_track.PlaylistId": 1, "playlist_track.TrackId": 3389}]
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, this solution does work for combining the two files together but the second half of the solution where you convert the file format throws error saying that the data has extra lines. Thanks for your time and effort :) @NicLaforge
Glad it worked, I tried the second part and I don't see your issue. You might want to ensure the following conversion works json.loads(line) with a try/catch. This would avoid an issue with empty line for example. For the data has extra lines, not sure where this is coming from.
Maybe its some redundant or corrupt data, as I am dealing with millions of lines. I will try try catch. Thanks again :) @NicLaforge

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.