I promise I searched and read several pages of google before I came to make this post. Due diligence has been done I swear.
I am trying to open a CSV file in python, read the file, make changes to it, and then write out a new file.
I got this far:
import csv
def water_data ():
with open('aquastat.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
final_file_name = "final_water.data.csv"
final_file = open(final_file_name,'w')
csv_writer = csv.writer(final_file,delimiter="\t")
for row in csv_reader:
csv_writer.writerow(row)
But I'm struggling to get any further. I want to remove certain columns, but I cannot comprehend how python will know the difference between a row and a column. For example, the columns are Area, Area ID, Year, Value, etc. I only want Area, Year, Value. I tried
for row in final_file:
final_file.writerow(row[0] + row[2] + row[4] + row[5])
but I kept getting the following error: IndexError: list index out of range
[I would also like to replace blank cells with a *, but the column thing is the priority]
Note that I cannot use Pandas
If possible I would really appreciate if someone could not just tell me the code but explain it to me so I can figure it out further myself.
TLDR: How can I remove empty rows from the CVS file and write only certain columns into the new file?
INPUT:
"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md"
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""
cat File.csv | cut -d, -f1,5,6