0

I am having the following method for log separation. Log format is exactly same as below but values may change

29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13

Code as follows:

     String regex = "^([0-9-]*)\\s([0-9:]*)\\s([0-9\\\\.]*)\\s([0-9]*|-)\\s([0-9\\\\.]*)\\s([0-9]*)\\s(GET|POST)\\s([0-9]*)\\s([0-9]*)\\s([0-9]*)\\s([a-zA-Z0-9\\\\./]*)\\s([a-zA-Z0-9:./]*)\\s(.*)\\s(.*)";
     String pattern = "$1~~$2~~$3~~$4~~$5~~$6~~$7~~$8~~$9~~$10~~$11~~$12~~$13~~$14";
     String values = "29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13";
     List<Object> params = new ArrayList<Object>();
     String formattedString = values.replaceAll(regex, pattern);
     String[] fields = formattedString.split("~~");
     for (String field : fields) {
        params.add(field);
      }
     System.out.println(params);

Problem Facing:

It is not splitting the log correctly.

After url : http://in.sample.com/parties/ is the problem.

Useragent consists of spaces. So log separartion is not working as expected.

Output

[29-11-2013, 19:18:53, 192.2.2.22, 66, 192.2.2.22, 8080, GET, 402, 103, 103, HTTP/1.1, 192.2.2.22, http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29, Safari/525.13]

Required Output:

[29-11-2013, 19:18:53, 192.2.2.22, 66, 192.2.2.22, 8080, GET, 402, 103, 103, HTTP/1.1, 192.2.2.22, http://in.sample.com/parties/, Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML like Gecko) Chrome/0.2.149.29 Safari/525.13]

Any help will be great.

2 Answers 2

1

You don't need a regex to do that. Since your log contains always 14 fields and since the problematics spaces are in the last field, all you need is to use the split method with the second parameter (limit):

String[] fields = values.split(" ", 14);
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks . it is working. If i want to add some fields in the last of log. It will problem like. it will split it in useragent space.
@karthick.k: you can add new fields before or add quotes to the last field and use " ('[^']+')?" as split pattern.
@karthick.k: Sorry you can't use the second method in java.
0

I believe you're missing matching HTTP/1.1 part. Try this regex:

String regex = "(?i)^([0-9-]*)\\s([0-9:]*)\\s([0-9.]*)\\s([0-9]*|-)\\s([0-9.]*)\\s([0-9]*)\\s(GET|POST)\\s([0-9]*)\\s([0-9]*)\\s([0-9]*)\\s(HTTP\/1\.[01])\s([A-Z0-9./]*)\\s([A-Z0-9:./]*)\\s(.*)";

It gives:

["29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13", "29-11-2013", "19:18:53", "192.2.2.22", "66", "192.2.2.22", "8080", "GET", "402", "103", "103", "HTTP/1.1", "192.2.2.22", "http://in.sample.com/parties/", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13"]

As an alternative you can try to find & use a dedicated log parser.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.