I have a task: I have a directory that has many text files. Each file has many lines. Each line has Tab delemeted field. I have to exclude some of the lines from this files by comparing the value in the first field with the value in another text file. Those 'bad' lines I have to copy to a new 'bad' file. The 'good' line (that did not match) I have to copy to another 'good' file. At the end I should have many new files ('good' and 'bad'). In other words script should parse each file in the directory, compare each line with the value in another file and in case it match copy that line into new file. I wrote this:
import csv
import sys
import os
prefix = 'dna'
goodFiles = []
badFiles = []
fileList = os.listdir(sys.argv[1])
for f in fileList:
absFile = os.path.join(os.path.abspath(sys.argv[1]), f )
newBadF = "BADFile" + "_" + f
badFile = open(newBadF,'w')
newGoodF = "GOODFile" + "_" + f
goodFile = open(newGoodF,'w')
resultList = open(sys.argv[2], 'rb')
convertList = list(resultList)
with open(absFile, 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
for row in reader:
for field in convertList:
if row[0].lower() == field.strip():
badFile.writelines('"%s"\n' % row)
next
else:
goodFile.writelines('"%s"\n' % row)
next
My script does not work :) i.e. it produces files where each line is a list like this: "['342', '343', '344', '345', '346', '347', '348', '349', '350']" while original file has different format i.e. it does not have comma, it does not have '[' and ']' My question: how to fix it and get new file with the same format as original ones? Thanks