0

I have a task: I have a directory that has many text files. Each file has many lines. Each line has Tab delemeted field. I have to exclude some of the lines from this files by comparing the value in the first field with the value in another text file. Those 'bad' lines I have to copy to a new 'bad' file. The 'good' line (that did not match) I have to copy to another 'good' file. At the end I should have many new files ('good' and 'bad'). In other words script should parse each file in the directory, compare each line with the value in another file and in case it match copy that line into new file. I wrote this:

import csv
import sys
import os

prefix = 'dna'
goodFiles = []
badFiles = []

fileList = os.listdir(sys.argv[1])

for f in fileList:
    absFile = os.path.join(os.path.abspath(sys.argv[1]), f )
    newBadF = "BADFile" + "_" + f
    badFile = open(newBadF,'w')
    newGoodF = "GOODFile" + "_" + f
    goodFile = open(newGoodF,'w')
    resultList = open(sys.argv[2], 'rb')
    convertList = list(resultList)
    with open(absFile, 'rb') as csvfile:
        reader = csv.reader(csvfile, delimiter='\t')
        for row in reader:
            for field in convertList:
                if row[0].lower() == field.strip():
                    badFile.writelines('"%s"\n' % row)
                    next
                else:
                    goodFile.writelines('"%s"\n' % row)
                    next

My script does not work :) i.e. it produces files where each line is a list like this: "['342', '343', '344', '345', '346', '347', '348', '349', '350']" while original file has different format i.e. it does not have comma, it does not have '[' and ']' My question: how to fix it and get new file with the same format as original ones? Thanks

1 Answer 1

2

you can use a csv.writer in the same way you are using a csv.reader if you would like the same delimiter

bad_writer = csv.writer(badFile, delimiter='\t')
good_writer = csv.writer(goodFile, delimiter='\t')
...
if row[0].lower() == field.strip():
    bad_writer.writerow(row)
else:
    good_writer.writerow(row)

etc.

When you call

badFile.writelines('"%s"\n' % row)

the % format operator actually turns the row into the string representation

>>> _list = [1,2,3]
>>> str(_list)
'[1, 2, 3]'
>>> 
Sign up to request clarification or add additional context in comments.

2 Comments

C.B. - thanks, But when I changed it for this: with open(absFile, 'rb') as csvfile: writer = csv.writer(csvfile, delimiter='\t') for row in writer: for field in convertList: if row[0].lower() == field.strip(): badFile.writerow('"%s"\n' % row) I've got error: > n:\scripts\deletemeafter\problemdna2.py(20)<module>() -> writer = csv.writer(csvfile, delimiter='\t') (Pdb) s TypeError: TypeErro...Writer',) > n:\scripts\deletemeafter\problemdna2.py(20)<module>() -> writer = csv.writer(csvfile, delimiter='\t') ..... TypeError: iteration ov
@susja you don't want to replace reader, but instead create a new writer as outlined above, and then execute writer.writewrow(row) (you will no longer be doing string formatting). Updated with more detail.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.