Insert or Update Documents in MongoDB from a CSV using Python

Question

I'm trying to insert/update documents in MongoDB based on information that I have in a CSV. Where if the first header of the CSV customer_id doesn't exist then it should create a new document but, if it does exist then it should just update all the values in the document.

I have the script built that will look for the customer_id and if it doesn't exist then it will create the new document but, having trouble getting the update part working.

Do you have to specify each header that needs to be updated or is there a more efficient way of updating by utilizing the headers from the CSV in the event new headers are added later on so that the script wouldn't have to be updated to specify the new headers:

import csv
from pymongo import MongoClient
  
conn = MongoClient('localhost', 27017)

db = conn.shipping
collection = db.sales

file = csv.reader(open("shipping_list.csv"), delimiter=',')

header = ["customer_id", "customer_name", "sales_rep", "purchase_date", "region", "purchase_price", "shipping_status", "products_purchased"]

for each in file:
    if collection.count_documents({ 'customer_id': each[0] }) == 0:
        row={}
        for n in range(0,len(header)):
            row[header[n]] = each[n]
                 
        collection.insert_one(row)
    else:
        row={}
       for n in range(0,len(header)):
            row[header[n]] = each[n]
                 
        collection.update({'customer_id': each[0]}, row)

I honestly hadnt looked at using something outside of Python. From what I just looked at using mongoimport would look like: mongoimport -d shipping -c sales --upsert --upsertFields customer_id --file shipping_list.csv — texnoob
– texnoob, Commented Apr 19, 2021 at 15:29
I think you miss option --mode=upsert --headerline --type=csv And you may tune date format of purchase_date with option --columnsHaveTypes. Have a look at the example at the bottom of the documentation page. — Wernfried Domscheit
– Wernfried Domscheit, Commented Apr 19, 2021 at 18:10

Belly Buster · Accepted Answer · 2021-04-19 19:26:18Z

2

If you want to use pymongo you can make you code much simpler using pandas and read_csv(). You only have to specify the key column so you can add more columns without changing the code. Use parse_dates if you want to store dates as "proper" dates not strings.

import pandas as pd
from pymongo import MongoClient

db = MongoClient()['mydatabase']

key = 'customer_id'
df = pd.read_csv('csv_pandas_mongo.csv', parse_dates=['purchase_date'])

for row in df.to_dict('records'):
    db.mycollection.update_one({key: row.get(key)}, {'$set': row}, upsert=True)

answered Apr 19, 2021 at 19:26

Belly Buster

8,9142 gold badges12 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

texnoob Over a year ago

This worked for what I was looking for. I did test using the Pymongo and mongoimport recommended approaches. Python seemed to parse the large CSV file and update MONGODB faster than using mongoimport.

Collectives™ on Stack Overflow

Insert or Update Documents in MongoDB from a CSV using Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related