1

I want to iteratively append pandas DataFrames to a csv file. This is usually not a problem. However, the DataFrames may not have all columns. So simply appending appends the DataFrame to the wrong columns.

I start with

with open('test.csv', 'w') as output:
    writer = csv.writer(output, delimiter=',')
    writer.writerow(['a','b', 'c'])

Then for example I add the DataFrame df

    a   b   c
0   2   2.0 3
1   2   NaN 3

using the command

df = pd.DataFrame([{'a':2, 'b':2, 'c':3}, {'a':2, 'c':3}])
df.to_csv('test.csv', index = False, header = False, mode = 'a') 

However, the next DataFrame that I want to append may look like

    a   c
0   1   1
1   1   1

When I append it again, I do not ant to write the header because it already exists. Doing the same as before (as expected) does not work:

df =pd.DataFrame([{'a':1, 'c':1}, {'a':1, 'c':1}])
df.to_csv('test.csv', index = False, header = False, mode = 'a')

It yields

    a   b   c
0   2   2.0 3.0
1   2   NaN 3.0
2   1   1.0 NaN
3   1   1.0 NaN

Of course I could import the existing csv into a DataFrame then append and the overwrite the old file:

file = pd.read_csv('test.csv')
df =pd.DataFrame([{'a':1, 'c':1}, {'a':1, 'c':1}])
file = file.append(df)
file.to_csv('test.csv', index = False, header = True)
pd.read_csv('test.csv')

This does exactly what I want

    a   b   c
0   2   2.0 3
1   2   NaN 3
2   1   NaN 1
3   1   NaN 1

But always reading the entire csv file and appending in pandas and overwriting the csv is definitely bad concerning performance when I repeat the process many times. I want to write my intermediate results to a csv because all the aggregated data is lost if I only append in a pandas DataFrame and then an error occurs. Any better solutions to my problem?

I also tried to add new empty columns but they get added at the end which doesnt help but may help to find a better performing solution.

def append_to_csv(df, file):
    if not os.path.exists(file):
        pd.to_csv(file, index = False, header = True)
    else:
        with open(file) as f:
            header = next(csv.reader(f))
        columns = df.columns
        for column in set(header) - set(columns):
            df[column] = np.nan
        df.to_csv(file, index = False, header = False, mode = 'a')

2 Answers 2

2

You can always append an empty column to the df like this:

In [958]: df['b']=''

Then re-structure the df like:

In [959]: df = df[['a','b','c']]

In [960]: df
Out[960]: 
   a b  c
0  1    1
1  1    1

Now, write it to csv.

In [961]: df.to_csv('test.csv', index = False, header = False, mode = 'a')

Let me know if this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

The restructuring is the key, thanks! Exactly what I needed.
1

Just for the sake of completeness I add here the function using Mayank Porwal's answer: Whenever you want to append a DataFrame to a csv with specified header. If you want to allow new columns (not contained in the header) you need to modify the funtion.

def append_to_csv(df, file):
    with open(file) as f:
        header = next(csv.reader(f))
    columns = df.columns
    for column in set(header) - set(columns):
        df[column] = ''
    df = df[header]
    df.to_csv(file, index = False, header = False, mode = 'a')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.