I am having the following method for log separation. Log format is exactly same as below but values may change
29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13
Code as follows:
String regex = "^([0-9-]*)\\s([0-9:]*)\\s([0-9\\\\.]*)\\s([0-9]*|-)\\s([0-9\\\\.]*)\\s([0-9]*)\\s(GET|POST)\\s([0-9]*)\\s([0-9]*)\\s([0-9]*)\\s([a-zA-Z0-9\\\\./]*)\\s([a-zA-Z0-9:./]*)\\s(.*)\\s(.*)";
String pattern = "$1~~$2~~$3~~$4~~$5~~$6~~$7~~$8~~$9~~$10~~$11~~$12~~$13~~$14";
String values = "29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13";
List<Object> params = new ArrayList<Object>();
String formattedString = values.replaceAll(regex, pattern);
String[] fields = formattedString.split("~~");
for (String field : fields) {
params.add(field);
}
System.out.println(params);
Problem Facing:
It is not splitting the log correctly.
After url : http://in.sample.com/parties/ is the problem.
Useragent consists of spaces. So log separartion is not working as expected.
Output
[29-11-2013, 19:18:53, 192.2.2.22, 66, 192.2.2.22, 8080, GET, 402, 103, 103, HTTP/1.1, 192.2.2.22, http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29, Safari/525.13]
Required Output:
[29-11-2013, 19:18:53, 192.2.2.22, 66, 192.2.2.22, 8080, GET, 402, 103, 103, HTTP/1.1, 192.2.2.22, http://in.sample.com/parties/, Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML like Gecko) Chrome/0.2.149.29 Safari/525.13]
Any help will be great.