0

I am trying to parse a text file which has the following format :

+++++
line1
line2
<<<<<
+++++
rline1
rline2
<<<<<

where, +++++ means start of record and <<<<< means end of record.

Now I want to output the whole texts into csv in the following format:

line1, line2
rline1, rline2

I am trying sth like this:

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']
output_lines =[]

for line in lines:
    if (line == "+++++") or not(line == "<<<<<") :
        if (line == "<<<<<"):
            output_lines.append(line)
            output_lines.append(",")

print (output_lines)

I am not sure how to move forward from here.

2 Answers 2

1

maybe something like this?

from itertools import groupby
import csv

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']

# remove the +++++s, so that only the <<<<<s indicate line breaks
cleaned_list = [ x for x in lines if x is not "+++++" ]

# separate at <<<<<s
rows = [list(group) for k, group in groupby(cleaned_list, lambda x: x == "<<<<<") if not k]

f = open('result.csv', 'wt')
try:
    writer = csv.writer(f)
    for row in rows:
        writer.writerow(row)
finally:
    f.close()

print open('result.csv', 'rt').read()
Sign up to request clarification or add additional context in comments.

1 Comment

Nice use of groupby, but you might want to add a little description of just what is going on here.
0

Collect lines in nested loops until the end-of-record marker, and write out the resulting list to a CSV file:

import csv

with open(inputfilename) as infh, open(outputfilename, 'w', newline='') as outfh:
    writer = csv.writer(outfh)
    for line in infh:
        if not line.startswith('+++++'):
            continue

        # found start, collect lines until end-of-record
        row = []
        for line in infh:
            if line.startswith('<<<<<'):
                # found end, end this inner loop
                break
            row.append(line.rstrip('\n'))

        if row:
            # lines for this record are added to the CSV file as a single row
            writer.writerow(row)

The outer loop takes lines from the input file, but skips anything that doesn't look like the start of a record. Once a start is found, a second, inner loop draws more lines from the file object, and as long as they do not look like the end of the record, adds them to a list object (sans the line separator).

When the end of a record is found, the inner loop is ended, and if any lines were collected in the row list, it is written out to the CSV file.

Demo:

>>> import csv
>>> from io import StringIO
>>> import sys
>>> demo = StringIO('''\
... +++++
... line1
... line2
... <<<<<
... +++++
... rline1
... rline2
... <<<<<
... ''')
>>> writer = csv.writer(sys.stdout)
>>> for line in demo:
...     if not line.startswith('+++++'):
...         continue
...     row = []
...     for line in demo:
...         if line.startswith('<<<<<'):
...             break
...         row.append(line.rstrip('\n'))
...     if row:
...         writer.writerow(row)
... 
line1,line2
13
rline1,rline2
15

The numbers after the written lines are the number of bytes written, as reported by writer.writerow().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.