extracting a large Postgres table and writing it to a csv file using Python Pandas Data frame

Question

I'm reading postgres table, extracting data and loading it into a csv file. The issue I have is that I'm able to read up to 5gb TABLE and successfully create a csv file. One of my tables is 35 GB and am unable to create a csv file, and the process is getting killed.

I suspect my dataframe is not able to handle large size.

What can we do to overcome this and create csv files successfully?

def table_to_csv(sql, file_path, dbname,port, user):

    """This function creates a csv file from PostgreSQL with query
    """
    try:
        conn = psycopg2.connect(dbname=dbname,  port=port, user=user)
        print("Connecting to Database")
        # Get data into pandas dataframe
        df = pd.read_sql(sql, conn)
        # Write to csv file
        df.to_csv(file_path, encoding='utf-8', header = True,doublequote = True, sep=',', index=False)
        print("CSV File has been created")
        conn.close()

    except Exception as e:
        print("Error: {}".format(str(e)))
        sys.exit(1)

Try doing it chunk by chunk. Anyway using pandas to work with 35 gb data is not a good practice. — Nihal Sangeeth
– Nihal Sangeeth, Commented Mar 26, 2019 at 20:25
Is your database running on the same machine as your Python code? — Chris
– Chris, Commented Mar 26, 2019 at 20:36
yes my database and Python code are running in the same EC2 (aws)machine. I have to read 20 tables from my db and create csv files, only 2 of the tables are more than 30 GB. — Ron
– Ron, Commented Mar 26, 2019 at 20:38

Chris · Accepted Answer · 2019-03-26 20:50:23Z

1

Since your database is running on the local machine your most efficient option will probably be to use PostgreSQL's COPY command, e.g. something like

COPY table_name TO file_path WITH (FORMAT csv, ENCODING UTF8, HEADER);

PostgreSQL will save the data directly to the file itself, without having to read it all into memory at once or have your Python code touch it at all.

You should be able to run this via psycopg2's standard cursor.execute function. Of course, you could also run it via psql or another PostgreSQL client of your choice.

answered Mar 26, 2019 at 20:50

Chris

139k139 gold badges316 silver badges293 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ron Over a year ago

Should i have the copy command as part of the cursor.execute function?

Chris Over a year ago

@Ron, I haven't run it this way but I think something like cur.execute("COPY table_name TO file_path WITH (FORMAT csv, ENCODING UTF8, HEADER)") should do the trick. Is that what you mean?

Ron Over a year ago

yep let me try it this way :) Hopefully this will work.

Ron · Accepted Answer · 2019-03-27 19:34:07Z

0

This worked with Cursor and its copy expert function. Here is the code snippet

  cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
    #execute
    with open("/sample.csv", 'w') as fd:

        cur.copy_expert("COPY sample TO STDOUT WITH (FORMAT CSV,  HEADER TRUE, 
        FORCE_QUOTE *)", fd)

answered Mar 27, 2019 at 19:34

Ron

131 silver badge3 bronze badges

Collectives™ on Stack Overflow

extracting a large Postgres table and writing it to a csv file using Python Pandas Data frame

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related