I have a different strings that certainly contains myWord (multiple times in some cases, only the first occurence should be handled), but the length of the strings are different. Some of them contains hundreds of substrings, some of the contains only a few substrings.
I would like to find a solution to obtain a snippet from the text. The rules are the following: the snippet should contains myWord and the X words before and after.
Something like this:
rawText= "This is an example lorem ipsum sentence for a Stackoverflow question."
myWord = "sentence"
Let's say I would like to get the content from the word 'sentence' and plus/minus 3 words like this:
"example lorem ipsum sentence for a Stackoverflow"
I could create a working solution, however it uses the number of chars to cut the snippet instead of the number of words before/after the myWord. So my question would be that is there any more suitable solution, maybe a built-in Python function to achieve my goal?
The current solution I use:
myWord = "mollis"
rawText = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse sit amet arcu vulputate, sodales arcu non, finibus odio. Aliquam sed tincidunt nisi, eu scelerisque lectus. Curabitur in nibh enim. Duis arcu ante, mollis sed iaculis non, hendrerit ut odio. Curabitur gravida condimentum posuere. Sed et arcu finibus felis auctor mollis et id risus. Nam urna tellus, ultricies a aliquam at, euismod et erat. Cras pretium venenatis ornare. Donec pulvinar dui eu dui facilisis commodo. Vivamus eget ultrices turpis, vel egestas lacus."
# The index where the word is located
wordIndexNumber = rawText.lower().find("%s" % (myWord,))
# The total length of the text (in chars)
textLength = len(rawText)
textPart2 = len(rawText)-wordIndexNumber
if wordIndexNumber < 80:
textIndex1 = 0
else:
textIndex1 = wordIndexNumber - 80
if textPart2 < 80:
textIndex2 = textLength
else:
textIndex2 = wordIndexNumber + 80
snippet = rawText[textIndex1:textIndex2]
print (snippet)
split()on your string, then apply your character-based solution to the resulting list.