I am currently converting PDFS to text in a giant folder and then outputting certain key words to an Excel file. Everything is working correctly except that even though I have multiple PDFS in my folder, they are over writing each other on column A1.
How do I iterate it so the next dictionary goes to the subsequent row?
custData = {}
def data_grabbing(pdf):
row = 0
col = 0
string = convert_pdf_to_txt(pdf)
lines = list(filter(bool,string.split('\n')))
for i in range(len(lines)):
if 'Lead:' in lines[i]:
custData['Name'] = lines[i+2]
elif 'Date:Date:Date:Date:' in lines[i]:
custData['Fund Manager'] = lines[i+2]
elif 'Priority:' in lines[i]:
custData['Industry'] = lines[i+2]
custData['Date'] = lines[i+1]
custData['Deal Size']= lines [i+3]
elif 'DEAL QUALIFYING MEMORANDUM' in lines[i]:
custData['Owner'] = lines[i+2]
elif 'Fund Manager' in lines[i]:
custData['Investment Type'] = lines [i+2]
print custData
for item, descrip in custData.iteritems():
worksheet.write(row, col, item)
worksheet.write(row+1, col, descrip)
col += 1
row +=2
for myFile in os.listdir(directory):
if myFile.endswith(".pdf"):
data_grabbing(os.path.join(directory, myFile))
workbook.close()
row +=2to inside yourfor item...looprow = 0outside your function in global scope. Inside your fucntion replacerow = 0withglobal row. Not really the best way to handle persistent state, but you can get away with it.row += 2inside the loop. The OP's real issue is that the next call todata_grabbingdoesn't start two rows down (and then two more rows down on another call and so on).