Append pandas DataFrame to csv with fixed header

Question

I want to iteratively append pandas DataFrames to a csv file. This is usually not a problem. However, the DataFrames may not have all columns. So simply appending appends the DataFrame to the wrong columns.

I start with

with open('test.csv', 'w') as output:
    writer = csv.writer(output, delimiter=',')
    writer.writerow(['a','b', 'c'])

Then for example I add the DataFrame df

    a   b   c
0   2   2.0 3
1   2   NaN 3

using the command

df = pd.DataFrame([{'a':2, 'b':2, 'c':3}, {'a':2, 'c':3}])
df.to_csv('test.csv', index = False, header = False, mode = 'a')

However, the next DataFrame that I want to append may look like

    a   c
0   1   1
1   1   1

When I append it again, I do not ant to write the header because it already exists. Doing the same as before (as expected) does not work:

df =pd.DataFrame([{'a':1, 'c':1}, {'a':1, 'c':1}])
df.to_csv('test.csv', index = False, header = False, mode = 'a')

It yields

    a   b   c
0   2   2.0 3.0
1   2   NaN 3.0
2   1   1.0 NaN
3   1   1.0 NaN

Of course I could import the existing csv into a DataFrame then append and the overwrite the old file:

file = pd.read_csv('test.csv')
df =pd.DataFrame([{'a':1, 'c':1}, {'a':1, 'c':1}])
file = file.append(df)
file.to_csv('test.csv', index = False, header = True)
pd.read_csv('test.csv')

This does exactly what I want

    a   b   c
0   2   2.0 3
1   2   NaN 3
2   1   NaN 1
3   1   NaN 1

But always reading the entire csv file and appending in pandas and overwriting the csv is definitely bad concerning performance when I repeat the process many times. I want to write my intermediate results to a csv because all the aggregated data is lost if I only append in a pandas DataFrame and then an error occurs. Any better solutions to my problem?

I also tried to add new empty columns but they get added at the end which doesnt help but may help to find a better performing solution.

def append_to_csv(df, file):
    if not os.path.exists(file):
        pd.to_csv(file, index = False, header = True)
    else:
        with open(file) as f:
            header = next(csv.reader(f))
        columns = df.columns
        for column in set(header) - set(columns):
            df[column] = np.nan
        df.to_csv(file, index = False, header = False, mode = 'a')

Mayank Porwal · Accepted Answer · 2018-11-03 12:29:30Z

2

You can always append an empty column to the df like this:

In [958]: df['b']=''

Then re-structure the df like:

In [959]: df = df[['a','b','c']]

In [960]: df
Out[960]: 
   a b  c
0  1    1
1  1    1

Now, write it to csv.

In [961]: df.to_csv('test.csv', index = False, header = False, mode = 'a')

Let me know if this helps.

answered Nov 3, 2018 at 12:29

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Valentin Over a year ago

The restructuring is the key, thanks! Exactly what I needed.

Valentin · Accepted Answer · 2018-11-03 12:57:04Z

1

Just for the sake of completeness I add here the function using Mayank Porwal's answer: Whenever you want to append a DataFrame to a csv with specified header. If you want to allow new columns (not contained in the header) you need to modify the funtion.

def append_to_csv(df, file):
    with open(file) as f:
        header = next(csv.reader(f))
    columns = df.columns
    for column in set(header) - set(columns):
        df[column] = ''
    df = df[header]
    df.to_csv(file, index = False, header = False, mode = 'a')

answered Nov 3, 2018 at 12:57

Valentin

1672 silver badges10 bronze badges

Collectives™ on Stack Overflow

Append pandas DataFrame to csv with fixed header

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related