I have a Python script to read the contents of a JSON file and import to a MongoDB.
I am getting the following error from it:
Traceback (most recent call last):
File "/home/luke/projects/vuln_backend/vuln_backend/mongodb.py", line 39, in process_files
file_content = currentFile.read()
File "/home/luke/envs/vuln_backend/lib64/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 14: invalid continuation byte
This is the code:
import json
import logging
import logging.handlers
import os
import glob
from logging.config import fileConfig
from zipfile import ZipFile
from pymongo import MongoClient
def process_files():
try:
client = MongoClient('5.57.62.97', 27017)
db = client['vuln_sets']
coll = db['vulnerabilities']
basepath = os.path.dirname(__file__)
filepath = os.path.abspath(os.path.join(basepath, ".."))
archive_filepath = filepath + '/vuln_files/'
archive_files = glob.glob(archive_filepath + "/*.zip")
for file in archive_files:
with open(file, "r") as currentFile:
file_content = currentFile.read()
vuln_content = json.loads(file_content)
for item in vuln_content:
coll.insert(item)
except Exception as e:
logging.exception(e)
I have tried setting the encoding to UTF8 and Windows-1252 but these do not seem to be able to read the JSON either.
How can I get it to determine which encoding is used in the JSON?