1

I have a csv file where I need to delete the second and the third row and 3rd to 18th column. I was able to do get it to work in two steps, which produced an interim file. I am thinking that there must be a better and more compact way to do this. Any suggestions would be really appreciated.

Also, if I want to remove multiple ranges of columns, how do I specify in this code. For example, if I want to remove columns 25 to 29, in addition to columns 3 to 18 already specified, how would I add to the code? Thanks

remove_from = 2
remove_to = 17

with open('file_a.csv', 'rb') as infile, open('interim.csv', 'wb') as outfile: 

    reader = csv.reader(infile)
    writer = csv.writer(outfile)

    for row in reader:
        del row[remove_from : remove_to]
        writer.writerow(row)

with open('interim.csv', 'rb') as infile, open('file_b.csv', 'wb') as outfile:

    reader = csv.reader(infile)
    writer = csv.writer(outfile)

    writer.writerow(next(reader))  

    reader.next()
    reader.next()

    for row in reader: 
        writer.writerow(row)
3
  • Are you open to use pandas? Commented Jun 8, 2018 at 22:27
  • yes, i would like to know how to do this in both pandas and not pandas. Commented Jun 8, 2018 at 22:35
  • There is hardly a more efficient way than using a temp file and then overwriting the original, but if you insist here is how to edit a file in-place (includes also various examples of in-place and temp-file approaches and comprehensive benchmarks). Commented Jun 8, 2018 at 22:47

1 Answer 1

2

Here is a pandas approach:

Step 1, creating a sample dataframe

import pandas as pd

# Create sample CSV-file (100x100)
df = pd.DataFrame(np.arange(10000).reshape(100,100))
df.to_csv('test.csv', index=False)

Step 2, doing the magic

import pandas as pd
import numpy as np

# Read first row to determine size of columns
size = pd.read_csv('test.csv',nrows=0).shape[1]

#want to remove columns 25 to 29, in addition to columns 3 to 18 already specified,
# Ok so let's create an array with the length of dataframe deleting the ranges
ranges = np.r_[3:19,25:30]
ar = np.delete(np.arange(size),ranges)

# Now let's read the dataframe
# let us also skip rows 2 and 3
df = pd.read_csv('test.csv', skiprows=[2,3], usecols=ar)

# And output
dt.to_csv('output.csv', index=False)

And the proof:

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you very much for the detailed answer!
@RJL If you are happy you can accept the answer. If not you can maybe point out what is missing :)
oops sorry missed it. I normally just voted up and now I see I need the green check box. thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.