0

I have multiple CSV files in one directory but with no headers. I'm looking for a robust way to add same headers to all files in my directory at once.

Sample.csv:

 John Doe    Guitar    4 units

Desired output after adding headers 'name', 'product', 'quantity':

 name       product    quantity 
John Doe    Guitar     4 units

so far I found a way to add headers into a single file with pandas:

from pandas import read_csv      
df = read_csv('/path/to/my/file/Sample.csv')
df.columns = ['name', 'product', 'quantity']
df.to_csv('/path/to/my/file/output.csv')

now I guess I would have to add a loop that would read all files in my directory and add desired header row into each. Could someone help me with this step or suggest some other easier approach if possible? Thank you in advance.

attempting to add loop but it throws an error message:

import pandas as pd 
import os
import glob
from pandas import read_csv 
path = '/path/to/my/files/'
filelist = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list = []
frame = pd.DataFrame()
#whenever i run the below line it throws this error ->   IndentationError: expected an indented block
for file in filelist:
    df2 = pd.read_csv(path+file)
    df2.columns = ['name', 'product', 'qunatity']
    list.append(df2)
frame = pd.concat(list)

1 Answer 1

1

Read_csv has a names parameter that you can use for columns.

If you want to add the same header into every csv you read. You can just pass the columns into the names parameter when you read the .csv files.


df = pd.read_csv('test_.csv', names = ['name', 'product', 'quantity'])

Editing your code. You are doing too much here you don't need to create a dataframe in the beginning. Also do not call your list "list" list is a special word in python.

You also do not need to add the path to the file, your glob list will already have the full path you need.

In regards to the indentation error. I would make sure you are using consistent indentations, sometimes that happens if you use spaces to indent for one line and a tab for another. I would simply delete the indentation and add it back the same way.

import pandas as pd 
import os
import glob
from pandas import read_csv 
path = '/path/to/my/files/'
filelist = glob.glob(path + "/*.csv")
df_list = []
for file in filelist:
# you also dont need to add path, the glob should already have the full path
    df2 = read_csv(file,names=['name', 'product', 'quantity'])
    ## save out files
    df2.to_csv(file,index=False)
    df_list.append(df2)
frame = pd.concat(df_list)
frame = pd.concat(df_list)

Also there is an even easier way to to this with list comprehension. See below.

import pandas as pd 
import os
import glob
path = '/path/to/my/files/'
filelist = glob.glob(path + "/*.csv")
frame = pd.concat([pd.read_csv(file,names=['name', 'product', 'quantity']) for file in filelist])
Sign up to request clarification or add additional context in comments.

6 Comments

hi jawsem, could you please share some more information? I'm only a beginner with python. I added some additional code above with my loop. However, it's hard for me to say where exactly and how should I add names parameter you've mentioned. thanks!
@Baobab1988 I added some additional details. Let me know if you have any questions. In regards to the initial post, the link has the documentation on the read_csv function you are using. Pandas is a well documented library so if you ever need help understanding what a function or method you call does you can always refer to its documenation.
Hi jawsem, thank you for detailed explanation! It seems to run with no errors now. However, I have one last question if you don't mind. It doesn't modify my CSV files. So headers are not present after running the script. Nevertheless I can see headers in terminal once I type print(frame). Would you know how can I save headers to my files in the specified path?
You can just add a to_csv in your for loop. I added it into the post.
I've tried adding this bit df=frame df.to_csv(path + "/*.csv") and this partially worked. Partially, because now it saved all the headers but not in each of my multiple csv files, but created a file named *.csv. Would you be able to help with this? sorry for so many questions, but those are my first steps with python. thanks!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.