How to parse with one regular expression this string in Python

Question

I need to parse this string, with only one regular expression in Python. For every group I need to save the value in a specific field. The problem is that one or more of the parameters may be missing or be in a different order. (i.e. domain 66666 ip nonce, with the middle part missing)

3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h

I need to assign:

time=2013-02-10T06:45:30.666821+00:00 (constant format)
domain=domain (string)
code=66666 (integer)
ip=127.0.0.1 (string)
pubvalue=kjiduensofksidoposiw (string of fixed length)
nonce=7896089hujoiuhiuh098h (string)

EDIT

This is an example on how the string can vary: 123dsf 2014-01-11T06:49:30.666821+00:00 google constant 12356 sync:[192.168.0.1] Request: pubvalue=fggggggeesidoposiw&nonce=7896089hujoiuhiuh098h

Thank you in advance for showing me the way.

When you need to grab values out of a string with a variable number of different items in a variable order, it is not a job for one regex. Why do you have this requirement for a single regex? — user1919238
– user1919238, Commented Feb 21, 2013 at 9:22
if the string isn't regular, then you're asking for hassle by trying to apply one regular expression to it — Lorcan O'Neill
– Lorcan O'Neill, Commented Feb 21, 2013 at 9:24
More detail is required on how the string to be parsed may vary. Please also provide code to adapt. — MikeM
– MikeM, Commented Feb 21, 2013 at 9:42
@NoobTom: I would say this is wrong! Because grouping and backreferencing and look aheads, -behinds and all this stuff, will be very slow, because your whole file will be interpreted in one step. I think there will be a lot of backtracking, which makes it slow. — tuxtimo
– tuxtimo, Commented Feb 21, 2013 at 9:45
@NoobTom, I don't think you should call re.match on the entire pattern for each term. Probably it would be better have intermediate steps, like: split the string on whitespace, split the request on &, then check the terms for the things you want. If you can define exactly what will change and what won't, that will help you make the most efficient algorithm. For example, if the time is always the second term, you can just take this instead of testing it. — user1919238
– user1919238, Commented Feb 21, 2013 at 10:01

tuxtimo · Accepted Answer · 2013-02-21 09:47:26Z

1

It's probably not a good idea to use one regex to parse the whole string. but I think the solution is to use named groups (see: Named groups on Regex Tutorial. Named groups can be captured by (?P<nameofgroup>bla)

So you can match for example the ip with:

import re
str = "3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h"
print re.search("\[(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]", str).groupdict()

Just extend this Regular expression with the patterns you need to match the other stuff.

and you can make the groups optional with placing a ? after the group's parantheses, like: (?P<ip>pattern)?. If a pattern could not be matched the element in the dict will be None.

But notice: It is not a good idea to do this in only one Regex. It will be slow (because of backtracking and stuff) and the Regex will be long and complicated to maintain!

edited Feb 21, 2013 at 9:47

answered Feb 21, 2013 at 9:32

tuxtimo

2,79023 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

NoobTom Over a year ago

Thank you i did not know about named groups! this is for use great help for assigning the match to a variable!

tuxtimo Over a year ago

So, have a look at the like to my edited answer: regular-expressions.info/named.html

Collectives™ on Stack Overflow

How to parse with one regular expression this string in Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related