Delete specific rows and columns from csv using Python in one step

Question

I have a csv file where I need to delete the second and the third row and 3rd to 18th column. I was able to do get it to work in two steps, which produced an interim file. I am thinking that there must be a better and more compact way to do this. Any suggestions would be really appreciated.

Also, if I want to remove multiple ranges of columns, how do I specify in this code. For example, if I want to remove columns 25 to 29, in addition to columns 3 to 18 already specified, how would I add to the code? Thanks

remove_from = 2
remove_to = 17

with open('file_a.csv', 'rb') as infile, open('interim.csv', 'wb') as outfile: 

    reader = csv.reader(infile)
    writer = csv.writer(outfile)

    for row in reader:
        del row[remove_from : remove_to]
        writer.writerow(row)

with open('interim.csv', 'rb') as infile, open('file_b.csv', 'wb') as outfile:

    reader = csv.reader(infile)
    writer = csv.writer(outfile)

    writer.writerow(next(reader))  

    reader.next()
    reader.next()

    for row in reader: 
        writer.writerow(row)

yes, i would like to know how to do this in both pandas and not pandas. — RJL
– RJL, Commented Jun 8, 2018 at 22:35
There is hardly a more efficient way than using a temp file and then overwriting the original, but if you insist here is how to edit a file in-place (includes also various examples of in-place and temp-file approaches and comprehensive benchmarks). — zwer
– zwer, Commented Jun 8, 2018 at 22:47

Anton vBR · Accepted Answer · 2018-06-08 22:59:39Z

2

Here is a pandas approach:

Step 1, creating a sample dataframe

import pandas as pd

# Create sample CSV-file (100x100)
df = pd.DataFrame(np.arange(10000).reshape(100,100))
df.to_csv('test.csv', index=False)

Step 2, doing the magic

import pandas as pd
import numpy as np

# Read first row to determine size of columns
size = pd.read_csv('test.csv',nrows=0).shape[1]

#want to remove columns 25 to 29, in addition to columns 3 to 18 already specified,
# Ok so let's create an array with the length of dataframe deleting the ranges
ranges = np.r_[3:19,25:30]
ar = np.delete(np.arange(size),ranges)

# Now let's read the dataframe
# let us also skip rows 2 and 3
df = pd.read_csv('test.csv', skiprows=[2,3], usecols=ar)

# And output
dt.to_csv('output.csv', index=False)

And the proof:

edited Jun 8, 2018 at 22:59

answered Jun 8, 2018 at 22:54

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

RJL Over a year ago

Thank you very much for the detailed answer!

Anton vBR Over a year ago

@RJL If you are happy you can accept the answer. If not you can maybe point out what is missing :)

RJL Over a year ago

oops sorry missed it. I normally just voted up and now I see I need the green check box. thank you!

Collectives™ on Stack Overflow

Delete specific rows and columns from csv using Python in one step

1 Answer 1

Step 1, creating a sample dataframe

Step 2, doing the magic

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Step 1, creating a sample dataframe

Step 2, doing the magic

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related