2

I've read a dozen different responses for questions like this but all of them fail to me.

I do not know how to compose a regex (is out of my abilities) and I need some help.

I'm trying to parse an Apache log file (set as default, since I didn't make any changes on it - using xampp).

I've tried the following patterns but all of them miss some lines:

    $pat='/(\d+\.\d+\.\d+\.\d+) ([^\s]+) ([^\s]+) \[(\d+)\/(\w+)\/(\d+):(\d{1,2}:\d{1,2}:\d{1,2} ?[\+\-]?\d*)\] "(.*) (HTTP\/\d\.\d)" (\d+) (\d+) "([^"]*)" "([^"]*)"/';
    $pat="/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] [\w.]+ \"(\S+) (.*?) (\S+)\" (\S+) (\S+) (\".*?\") (\".*?\")$/";
    $pat='/^(\S+) \S+ \S+ \[([^\]]+)\] "([A-Z]+)[^"]*" \d+ \d+ "[^"]*" "([^"]*)"$/m';
    $pat='/^(\S+) \S+ \S+ \[(.*?)\] "(\S+).*?" \d+ \d+ "(.*?)" "(.*?)"/';
    $pat='/^(\S+)\s \S+\s+ (?:\S+\s+)+ \[([^]]+)\]\s "(\S*)\s? (?:((?:[^"]*(?:\\")?)*)\s ([^"]*)"\s| ((?:[^"]*(?:\\")?)*)"\s) (\S+)\s (\S+)\s "((?:[^"]*(?:\\")?)*)"\s "(.*)"$/';

    preg_match($pat, $b, $m);

The first one is the best so far (113 misses on 1,000 records).

And here is a sample of a missing line:

10.21.142.253 - - [25/Oct/2014:07:42:36 -0200] "GET / HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 6.1; rv:33.0) Gecko/20100101 Firefox/33.0" 

It does work on lines like this:

127.0.0.1 - - [25/Oct/2014:08:49:51 -0200] "GET /xampp/ HTTP/1.1" 401 1392 "-" "Mozilla/5.0 (Windows NT 6.1; rv:33.0) Gecko/20100101 Firefox/33.0" 

There seems to be something wrong on file size.

1
  • What lines do you want to get? Commented May 7, 2015 at 12:34

1 Answer 1

3

The difference is in just a hyphen - that you can add as an alternative (\d+|-):

(\d+(?:\.\d+){3})\s+(\S+)\s+(\S+)\s+\[(\d+)\/(\w+)\/(\d+):(\d{1,2}:\d{1,2}:\d{1,2}\s?[+-]?\d*)\] "(.*?)\s+(HTTP\/\d\.\d)"\s+(\d+)\s*(\d+|-)\s+"([^"]*)"\s+"([^"]*)"

I also "converted" literal spaces to \s+. If you are not satisifed with that change, just use your original one with the alternation:

(\d+\.\d+\.\d+\.\d+) ([^\s]+) ([^\s]+) \[(\d+)\/(\w+)\/(\d+):(\d{1,2}:\d{1,2}:\d{1,2} ?[\+\-]?\d*)\] "(.*) (HTTP\/\d\.\d)" (\d+) (\d+|-) "([^"]*)" "([^"]*)

See Demo 1 and Demo 2

Sign up to request clarification or add additional context in comments.

2 Comments

It works fine. $pat='/(\d+(?:\.\d+){3})\s+(\S+)\s+(\S+)\s+\[(\d+)\/(\w+)\/(\d+):(\d{1,2}:\d{1,2}:\d{1,2}\s?[+-]?\d*)\] "(.*?)\s+(HTTP\/\d\.\d)"\s+(\d+)\s*(\d+|-)\s+"([^"]*)"\s+"([^"]*)"/'; results in 100.000 reads and 0 fail. thank you.
Great. I think \s is really useful when it comes to longer regular expressions that need debugging or update. Also, instead of [^\s] it is also safer to use the \S (non-whitespace).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.