Extracting multiple values from string in python with regex

Question

I have multiple strings which looks like this product: green apples price: 2.0 country: france company: somecompany. Some strings might have fewer fields. For example some are missing company name or country etc. I am trying to extract values only and skip product,price,country,company. I tried to create multiple regexes, which starts from the left side of each string.

blah="product: green apples price: 2.0 country: france company: somecompany"

product_reg = re.compile(r'.*?\bproduct\b:(.*).*')
product_reg_strip = re.compile(r'(.*?)\s[a-z]:?')

product_full=re.findall(product_reg, blah)
prod=re.find(product_reg_strip, str(product_full))
print prod

price_reg = re.compile(r'.*?\bprice\b:(.*).*')
price_reg_strip = re.compile(r'(.*?)\s[a-z]:?')

price_full=re.findall(price_reg, blah)
price=re.find(price_reg_strip, str(price_full))
print price

But this is not working. What should i do to make this regex more sane?

What do you want the output to be? In your example, is it green apples 2.0 france somecompany? — tdelaney
– tdelaney, Commented Apr 20, 2017 at 16:11

Francesco De Rosa · Accepted Answer · 2017-04-21 09:55:06Z

You can use simply a regexp and get named group results. You also can have or not all the values as you asked, the regexp works fine in all cases. Try using this global multiline regexp on regex101.com https://regex101.com/r/iccVUv/1/:

^(?:product:(?P<product>.*?))(?:price:(?P<price>.*?))?(?:country:(?P<country>.*?))?(?:company:(?P<company>.*))?$

In python you can, for example do this:

pattern = '^(?:product:(?P<product>.*?))(?:price:(?P<price>.*?))?(?:country:(?P<country>.*?))?(?:company:(?P<company>.*))?$'
matches = re.search(pattern, 'product: green apples price: 2.0 country: italy company: italian company')

Now you can get data simply using:

product = matches.group('product')

You finally need only to check if the match is satisfacted and trim spaces like:

if matches1.group('product') is not None:
  product = matches.group('product').strip()

Toto · Accepted Answer · 2017-04-20 16:21:34Z

1

You could split the string like that:

str = "product: green apples price: 2.0 country: france company: somecompany"
p = re.compile(r'(\w+:)')
res = p.split(str)
print res
for i in range(len(res)):
    if (i%2):
        print res[i],' ==> ',res[i+1]

Output:

['', 'product:', ' green apples ', 'price:', ' 2.0 ', 'country:', ' france ', 'company:', ' somecompany']

product:  ==>   green apples 
price:  ==>   2.0 
country:  ==>   france 
company:  ==>   somecompany

edited Apr 20, 2017 at 16:21

answered Apr 20, 2017 at 16:10

Toto

91.7k63 gold badges97 silver badges135 bronze badges

Comments

tdelaney · Accepted Answer · 2017-04-20 16:16:41Z

0

I'm not completely sure what you are after, but if the things you want to remove are a single word followed by a colon, the regex is pretty easy. Here are a couple of samples.

>>> import re
>>> blah="product: green apples price: 2.0 country: france company: somecompany"
>>> re.sub(r'\w+: ?', '', blah)
'green apples 2.0 france somecompany'
>>> re.split(r'\w+: ?', blah)[1:]
['green apples ', '2.0 ', 'france ', 'somecompany']

answered Apr 20, 2017 at 16:16

tdelaney

77.9k6 gold badges91 silver badges129 bronze badges

Collectives™ on Stack Overflow

Extracting multiple values from string in python with regex

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related