How to parse text file in python

Question

I have a task: I have a directory that has many text files. Each file has many lines. Each line has Tab delemeted field. I have to exclude some of the lines from this files by comparing the value in the first field with the value in another text file. Those 'bad' lines I have to copy to a new 'bad' file. The 'good' line (that did not match) I have to copy to another 'good' file. At the end I should have many new files ('good' and 'bad'). In other words script should parse each file in the directory, compare each line with the value in another file and in case it match copy that line into new file. I wrote this:

import csv
import sys
import os

prefix = 'dna'
goodFiles = []
badFiles = []

fileList = os.listdir(sys.argv[1])

for f in fileList:
    absFile = os.path.join(os.path.abspath(sys.argv[1]), f )
    newBadF = "BADFile" + "_" + f
    badFile = open(newBadF,'w')
    newGoodF = "GOODFile" + "_" + f
    goodFile = open(newGoodF,'w')
    resultList = open(sys.argv[2], 'rb')
    convertList = list(resultList)
    with open(absFile, 'rb') as csvfile:
        reader = csv.reader(csvfile, delimiter='\t')
        for row in reader:
            for field in convertList:
                if row[0].lower() == field.strip():
                    badFile.writelines('"%s"\n' % row)
                    next
                else:
                    goodFile.writelines('"%s"\n' % row)
                    next

My script does not work :) i.e. it produces files where each line is a list like this: "['342', '343', '344', '345', '346', '347', '348', '349', '350']" while original file has different format i.e. it does not have comma, it does not have '[' and ']' My question: how to fix it and get new file with the same format as original ones? Thanks

C.B. · Accepted Answer · 2014-04-17 16:29:11Z

2

you can use a csv.writer in the same way you are using a csv.reader if you would like the same delimiter

bad_writer = csv.writer(badFile, delimiter='\t')
good_writer = csv.writer(goodFile, delimiter='\t')
...
if row[0].lower() == field.strip():
    bad_writer.writerow(row)
else:
    good_writer.writerow(row)

etc.

When you call

badFile.writelines('"%s"\n' % row)

the % format operator actually turns the row into the string representation

>>> _list = [1,2,3]
>>> str(_list)
'[1, 2, 3]'
>>>

edited Apr 17, 2014 at 16:29

answered Apr 17, 2014 at 15:01

C.B.

8,3965 gold badges23 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

susja Over a year ago

C.B. - thanks, But when I changed it for this: with open(absFile, 'rb') as csvfile: writer = csv.writer(csvfile, delimiter='\t') for row in writer: for field in convertList: if row[0].lower() == field.strip(): badFile.writerow('"%s"\n' % row) I've got error: > n:\scripts\deletemeafter\problemdna2.py(20)<module>() -> writer = csv.writer(csvfile, delimiter='\t') (Pdb) s TypeError: TypeErro...Writer',) > n:\scripts\deletemeafter\problemdna2.py(20)<module>() -> writer = csv.writer(csvfile, delimiter='\t') ..... TypeError: iteration ov

C.B. Over a year ago

@susja you don't want to replace reader, but instead create a new writer as outlined above, and then execute writer.writewrow(row) (you will no longer be doing string formatting). Updated with more detail.

Collectives™ on Stack Overflow

How to parse text file in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related