1

I need to parse this string, with only one regular expression in Python. For every group I need to save the value in a specific field. The problem is that one or more of the parameters may be missing or be in a different order. (i.e. domain 66666 ip nonce, with the middle part missing)

3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h

I need to assign:

  • time=2013-02-10T06:45:30.666821+00:00 (constant format)
  • domain=domain (string)
  • code=66666 (integer)
  • ip=127.0.0.1 (string)
  • pubvalue=kjiduensofksidoposiw (string of fixed length)
  • nonce=7896089hujoiuhiuh098h (string)

EDIT

This is an example on how the string can vary: 123dsf 2014-01-11T06:49:30.666821+00:00 google constant 12356 sync:[192.168.0.1] Request: pubvalue=fggggggeesidoposiw&nonce=7896089hujoiuhiuh098h

Thank you in advance for showing me the way.

9
  • 6
    When you need to grab values out of a string with a variable number of different items in a variable order, it is not a job for one regex. Why do you have this requirement for a single regex? Commented Feb 21, 2013 at 9:22
  • if the string isn't regular, then you're asking for hassle by trying to apply one regular expression to it Commented Feb 21, 2013 at 9:24
  • More detail is required on how the string to be parsed may vary. Please also provide code to adapt. Commented Feb 21, 2013 at 9:42
  • 1
    @NoobTom: I would say this is wrong! Because grouping and backreferencing and look aheads, -behinds and all this stuff, will be very slow, because your whole file will be interpreted in one step. I think there will be a lot of backtracking, which makes it slow. Commented Feb 21, 2013 at 9:45
  • 1
    @NoobTom, I don't think you should call re.match on the entire pattern for each term. Probably it would be better have intermediate steps, like: split the string on whitespace, split the request on &, then check the terms for the things you want. If you can define exactly what will change and what won't, that will help you make the most efficient algorithm. For example, if the time is always the second term, you can just take this instead of testing it. Commented Feb 21, 2013 at 10:01

1 Answer 1

1

It's probably not a good idea to use one regex to parse the whole string. but I think the solution is to use named groups (see: Named groups on Regex Tutorial. Named groups can be captured by (?P<nameofgroup>bla)

So you can match for example the ip with:

import re
str = "3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h"
print re.search("\[(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]", str).groupdict()

Just extend this Regular expression with the patterns you need to match the other stuff.

and you can make the groups optional with placing a ? after the group's parantheses, like: (?P<ip>pattern)?. If a pattern could not be matched the element in the dict will be None.

But notice: It is not a good idea to do this in only one Regex. It will be slow (because of backtracking and stuff) and the Regex will be long and complicated to maintain!

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you i did not know about named groups! this is for use great help for assigning the match to a variable!
So, have a look at the like to my edited answer: regular-expressions.info/named.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.