0

I'm trying to use regular expressions to parse text like this:

'''ErrorID:  951574305
Time:     Mon Apr 25 16:01:34 CEST 2011
URL:      /documents.do
HttpCode: null
Error:    class java.lang.NullPointerException: null''' 

Where keywords ErrorID: , Time: , URL: are always the same and I need to search for them. How do I parse this text?

1
  • Seems overkill for regex... you could just split on newline then colon and trim whitespace. Commented Apr 25, 2011 at 18:24

3 Answers 3

1
import re
re.findall("ErrorID:\s+(.*)", text)
# ['951574305']
re.findall("Time:\s+(.*)", text)
# ['Mon Apr 25 16:01:34 CEST 2011']
re.findall("URL:\s+(.*)", text)
# ['/documents.do']

The regex works this way: it matches on ErrorID:(or other delimiter) plus some spaces, plus the rest of the string until the newline/end of string. Then it returns that "something" after the whitespace. Also, the result will be a list in which you will need the first item. There can be other strategies of finding what you need, but I found this the most appropriate.

Sign up to request clarification or add additional context in comments.

1 Comment

i need just matching regexp for this pattern , i don't care about values.
0

If your implementation supports named groups...

/ErrorID:\s+(?<ID>.*)\nTime:\s+(?<Time>.*)\nURL:\s+(?<URL>.*)/g

You can then reference them by name.

Otherwise by index

/ErrorID:\s+(.*)\nTime:\s+(.*)\nURL:\s+(.*)/g

$1 for ID, $2 for Time and $3 for URL.

Comments

0

If you require all of these in the string and don't know where they are and can use lookahead assertions:

(?=[\S\s]*ErrorID:)(?=[\S\s]*Time:)(?=[\S\s]*URL:)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.