1

I am parsing following apache log entry

59.167.203.103 - - [28/May/2013:03:12:47 +0000] "POST /some/some.htm HTTP/1.1" 200 1187 "-" "xyzf/2.00.16 xyzNetwork/609.1.4 xyzwin/13.0.0"

with given below RegEx and its working fine.

String logentrypattern = "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+) \"([^\"]+)\" \"([^\"]+)\"";

But in few entries responsebytes are "-" instead of some value, this is giving me following erorr and saying unable to parse. plz help

Bad log entry (or problem with RE?):
89.178.46.54 - - [24/May/2013:17:04:59 +0000] "PUT /xyz-pmp/xyz-pmp.htm HTTP/1.1" 200 - "-" "kdm/1.0"

1 Answer 1

1

You could try this:

^([\\d.]+) (\\S+) (\\S+) \\[([\\w:\/]+\\s[+\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+|-) \"([^\"]+)\" \"([^\"]+)\"
                                                                                 ^^

I added the bit where you can have a dash. Maybe it'd be better for you to have a \\S+ block instead there? Well, it'll all depend on what you're doing exactly. If the intent is to accept only the entries with digits, then your regex is working as intended. If it's just to capture the different parts of the entries, make sure you know the structure of the data and the different forms they can come to you.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.