0

I'm trying to insert/update documents in MongoDB based on information that I have in a CSV. Where if the first header of the CSV customer_id doesn't exist then it should create a new document but, if it does exist then it should just update all the values in the document.

I have the script built that will look for the customer_id and if it doesn't exist then it will create the new document but, having trouble getting the update part working.

Do you have to specify each header that needs to be updated or is there a more efficient way of updating by utilizing the headers from the CSV in the event new headers are added later on so that the script wouldn't have to be updated to specify the new headers:

import csv
from pymongo import MongoClient
  
conn = MongoClient('localhost', 27017)

db = conn.shipping
collection = db.sales

file = csv.reader(open("shipping_list.csv"), delimiter=',')

header = ["customer_id", "customer_name", "sales_rep", "purchase_date", "region", "purchase_price", "shipping_status", "products_purchased"]

for each in file:
    if collection.count_documents({ 'customer_id': each[0] }) == 0:
        row={}
        for n in range(0,len(header)):
            row[header[n]] = each[n]
                 
        collection.insert_one(row)
    else:
        row={}
       for n in range(0,len(header)):
            row[header[n]] = each[n]
                 
        collection.update({'customer_id': each[0]}, row)
3
  • Do you insist on python? If not, have a look at mongoimport Commented Apr 19, 2021 at 15:04
  • I honestly hadnt looked at using something outside of Python. From what I just looked at using mongoimport would look like: mongoimport -d shipping -c sales --upsert --upsertFields customer_id --file shipping_list.csv Commented Apr 19, 2021 at 15:29
  • 1
    I think you miss option --mode=upsert --headerline --type=csv And you may tune date format of purchase_date with option --columnsHaveTypes. Have a look at the example at the bottom of the documentation page. Commented Apr 19, 2021 at 18:10

1 Answer 1

2

If you want to use pymongo you can make you code much simpler using pandas and read_csv(). You only have to specify the key column so you can add more columns without changing the code. Use parse_dates if you want to store dates as "proper" dates not strings.

import pandas as pd
from pymongo import MongoClient

db = MongoClient()['mydatabase']

key = 'customer_id'
df = pd.read_csv('csv_pandas_mongo.csv', parse_dates=['purchase_date'])

for row in df.to_dict('records'):
    db.mycollection.update_one({key: row.get(key)}, {'$set': row}, upsert=True)
Sign up to request clarification or add additional context in comments.

1 Comment

This worked for what I was looking for. I did test using the Pymongo and mongoimport recommended approaches. Python seemed to parse the large CSV file and update MONGODB faster than using mongoimport.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.