1

I have a Python script to read the contents of a JSON file and import to a MongoDB.

I am getting the following error from it:

Traceback (most recent call last):
  File "/home/luke/projects/vuln_backend/vuln_backend/mongodb.py", line 39, in process_files
    file_content = currentFile.read()
  File "/home/luke/envs/vuln_backend/lib64/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 14: invalid continuation byte

This is the code:

import json
import logging
import logging.handlers
import os
import glob
from logging.config import fileConfig
from zipfile import ZipFile
from pymongo import MongoClient


def process_files():
    try:
        client = MongoClient('5.57.62.97', 27017)
        db = client['vuln_sets']
        coll = db['vulnerabilities']
        basepath = os.path.dirname(__file__)
        filepath = os.path.abspath(os.path.join(basepath, ".."))
        archive_filepath = filepath + '/vuln_files/'
        archive_files = glob.glob(archive_filepath + "/*.zip")

        for file in archive_files:
            with open(file, "r") as currentFile:
                file_content = currentFile.read()
                vuln_content = json.loads(file_content)
            for item in vuln_content:
                coll.insert(item)
    except Exception as e:
        logging.exception(e)

I have tried setting the encoding to UTF8 and Windows-1252 but these do not seem to be able to read the JSON either.

How can I get it to determine which encoding is used in the JSON?

5
  • 2
    Well, you have to unzip your file before reading it... You even import the module but don't use it. Commented Oct 26, 2017 at 10:03
  • I've been staring at this for a couple of hours.....I completely forgot to add that code in! I think i've gone code blind! Thank you for pointing this out! Commented Oct 26, 2017 at 10:04
  • "Fresh pair of eyes"... as they say. You're welcome. Commented Oct 26, 2017 at 10:05
  • Can you put that as an answer so I can accept it and give the reputation? Commented Oct 26, 2017 at 10:41
  • Done, thanks. I've also tried to substantiate my answer with other "best programming" tips. Hope they help. Commented Oct 26, 2017 at 10:48

1 Answer 1

1

Notice that you are trying to call json.load on a zipped file. You'll have to unzip it first, that you do using the zipfile module, like this:

with open ZipFile(file, 'r') as f:
    f.extractall(dest)

Where file is the loop variable.

Furthermore, when reading a JSON file, I'd recommend using json.load(fileobj) (1 step) over reading your file contents and calling json.loads(string_from_file) in the string (2 steps).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.