Parsing and outputting file as CSV in Python

Question

I am trying to parse a text file which has the following format :

+++++
line1
line2
<<<<<
+++++
rline1
rline2
<<<<<

where, +++++ means start of record and <<<<< means end of record.

Now I want to output the whole texts into csv in the following format:

line1, line2
rline1, rline2

I am trying sth like this:

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']
output_lines =[]

for line in lines:
    if (line == "+++++") or not(line == "<<<<<") :
        if (line == "<<<<<"):
            output_lines.append(line)
            output_lines.append(",")

print (output_lines)

I am not sure how to move forward from here.

martinenzinger · Accepted Answer · 2014-10-19 15:20:01Z

1

maybe something like this?

from itertools import groupby
import csv

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']

# remove the +++++s, so that only the <<<<<s indicate line breaks
cleaned_list = [ x for x in lines if x is not "+++++" ]

# separate at <<<<<s
rows = [list(group) for k, group in groupby(cleaned_list, lambda x: x == "<<<<<") if not k]

f = open('result.csv', 'wt')
try:
    writer = csv.writer(f)
    for row in rows:
        writer.writerow(row)
finally:
    f.close()

print open('result.csv', 'rt').read()

edited Oct 19, 2014 at 15:20

answered Oct 19, 2014 at 1:05

martinenzinger

2,3162 gold badges20 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

PaulMcG Over a year ago

Nice use of groupby, but you might want to add a little description of just what is going on here.

Martijn Pieters · Accepted Answer · 2014-10-19 00:38:37Z

Collect lines in nested loops until the end-of-record marker, and write out the resulting list to a CSV file:

import csv

with open(inputfilename) as infh, open(outputfilename, 'w', newline='') as outfh:
    writer = csv.writer(outfh)
    for line in infh:
        if not line.startswith('+++++'):
            continue

        # found start, collect lines until end-of-record
        row = []
        for line in infh:
            if line.startswith('<<<<<'):
                # found end, end this inner loop
                break
            row.append(line.rstrip('\n'))

        if row:
            # lines for this record are added to the CSV file as a single row
            writer.writerow(row)

The outer loop takes lines from the input file, but skips anything that doesn't look like the start of a record. Once a start is found, a second, inner loop draws more lines from the file object, and as long as they do not look like the end of the record, adds them to a list object (sans the line separator).

When the end of a record is found, the inner loop is ended, and if any lines were collected in the row list, it is written out to the CSV file.

Demo:

>>> import csv
>>> from io import StringIO
>>> import sys
>>> demo = StringIO('''\
... +++++
... line1
... line2
... <<<<<
... +++++
... rline1
... rline2
... <<<<<
... ''')
>>> writer = csv.writer(sys.stdout)
>>> for line in demo:
...     if not line.startswith('+++++'):
...         continue
...     row = []
...     for line in demo:
...         if line.startswith('<<<<<'):
...             break
...         row.append(line.rstrip('\n'))
...     if row:
...         writer.writerow(row)
... 
line1,line2
13
rline1,rline2
15

The numbers after the written lines are the number of bytes written, as reported by writer.writerow().

Collectives™ on Stack Overflow

Parsing and outputting file as CSV in Python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related