4

Given an array of separators:

columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]

and a string where some columns were left blank (and there is random white space):

input = "Name:      JohnID:123:45Date:  8/2/17Building:Room:Notes:  i love notes"

How can I get this:

["John", "123:45", "8/2/17", "", "", "i love notes"]

I've tried simply removing the substrings to see where I can go from there but I'm still stuck

import re
input = re.sub(r'|'.join(map(re.escape, columns)), "", input)
0

2 Answers 2

5

use the list to generate a regular expression by inserting (.*) in between, then use strip to remove spaces:

import re

columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
s = "Name:      JohnID:123:45Date:  8/2/17Building:Room:Notes:  i love notes"

result = [x.strip() for x in re.match("".join(map("{}(.*)".format,columns)),s).groups()]

print(result)

yields:

['John', '123:45', '8/2/17', '', '', 'i love notes']

the strip part can be handled by the regular expression at the expense of a more complex regex, but simpler overall expression:

result = re.match("".join(map("{}\s*(.*)\s*".format,columns)),s).groups()

more complex: if field data contains regex special chars, we have to escape them (not the case here):

result = re.match("".join(["{}\s*(.*)\s*".format(re.escape(x)) for x in columns]),s).groups()
Sign up to request clarification or add additional context in comments.

2 Comments

For some reason I am getting ['John', '123:45', '8/2/17', '', '', '']
edited, the greedy mode caused problems apparently. Now fixed.
4

How about using re.split?

>>> import re
>>> columns = ["Name:", "ID:", "Date:", "Building:", "Room:", "Notes:"]
>>> i = "Name:      JohnID:123:45Date:  8/2/17Building:Room:Notes:  i love notes"
>>> re.split('|'.join(map(re.escape, columns)), i)
['', '      John', '123:45', '  8/2/17', '', '', '  i love notes']

To get rid of the whitespace, split on whitespace too:

>>> re.split(r'\s*' + (r'\s*|\s*'.join(map(re.escape, columns))) + r'\s*', i.strip())
['', 'John', '123:45', '8/2/17', '', '', '  i love notes']

2 Comments

good, and probably what OP had in mind. But issues an empty field at start.
@Jean-FrançoisFabre The empty field at the start is because you split by the value "Name:". There is nothing to the left of that, so it makes an empty string. There could be something to the left.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.