Reading substrings from string in Python

Question

I am doing some research where I have +25,000 reports in one large text-file. Each report is divided by "TEXTSTART[UNIQUE-ID]" and "TEXTEND".

So far I have succeded in reading a single report (that is text between the indentifiers) from the txt-file with this code:

f = open("samples_combined_incomplete.txt","r" )
report = f.read()
f.close()

rstart = "TEXTSTART"
rend = "TEXTEND"

a = ((report.split(rstart))[1].split(rend)[0])
print (a)

My question is this; how can I divide the text-document into uniquely identifiable substrings, based on TEXTSTART[UNIQUE-ID]? And how should the ID be returned?

I am just starting, so any advise on documentation, useful functions, etc. would be much appriciated.

Thank you, works like a charm! The IDs are a combination of numbers and characters FYI.

f = open("samples_combined_incomplete.txt","r" )
report = f.read()
f.close()

rstart = "TEXTSTART"
rend = "TEXTEND"
a = 0

dict = re.findall('TEXTSTART\[(.*?)\](.*?)TEXTEND', report, re.DOTALL)

while a < 10:
    print (dict[a])
    a += 1

If I want to search within the containers for a specific keyword and have the keys returned, how could I do that?

have you considered regular expressions? (docs.python.org/2/library/re.html) also, is each of these substrings on a new line? — Inbar Rose
– Inbar Rose, Commented Dec 9, 2012 at 15:53

bluepnume · Accepted Answer · 2012-12-09 16:03:46Z

5

import re
print dict(re.findall('TEXTSTART\[([^\]]+)\](.*?)TEXTEND', report, re.DOTALL))

edited Dec 9, 2012 at 16:03

answered Dec 9, 2012 at 15:52

bluepnume

17.2k8 gold badges42 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Blckknght Over a year ago

If the text spans multiple lines, I think this will need re.DOTALL to be specified as an option.

Collectives™ on Stack Overflow

Reading substrings from string in Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related