1
10.177.116.76 - U031503@nttdata [11/Mar/2013:09:42:44 +0900] "GET /infovia/ga/ga004rp0002.action HTTP/1.1" 302 301 "https://tb-infovia.groupwide.net/infovia/ga/ga013rp0004.action?messageId=errors.Authentication.001" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET CLR 1.1.4322)"

The above is the access log line. There are two action ids. I want to extract the first action id before HTTP by using regex pattern. Now I use this pattern ([^/\"]*).action. It matched both action id in line anywhere. I was testing this problem two days ago. Could you please help me?

2
  • what language are you using? Commented May 27, 2013 at 2:43
  • There is /infovia/ga/ga004rp0002.action surrounded by blanks, and there is "https://tb-infovia.groupwide.net/infovia/ga/ga013rp0004.action?messageId=errors.Authentication.001"; if you're looking for the first, don't you delimit your search regex with spaces so as to pick up the first and not the second. Commented May 27, 2013 at 3:00

4 Answers 4

1

This will match the first id:

action \S+" (\d+)

Get group 1 from the match

Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

(?<=GET\s).*?([^/\"]*).action

or use this

([^/\"]*).action.*?([^/\"]*).action

and get group 1.

explanation:

*? Matches the previous element zero or more times, but as few times as possible. (?<=subexpression) Zero-width positive lookbehind assertion.

Comments

0

If I understand your question correctly, your problem is that there are two "action IDs" in the string, and you want to capture both. However, with your current regex, which matches both, depending on how you are evaluating this regex, you may only be getting the first match. So, in order to extract both with one match, you'll need to repeat the regex and then consume everything between the parts you want to capture:

([^/\"]*).action.*([^/\"]*).action

This is your regex ([^/\"]*).action, repeated twice, with .* in the middle, which matches anything an unlimited number of times. Then both actions are available in capturing groups one and two.

Comments

0

If you're sure it will always be followed by HTTP, you can use a lookahead:

([^/\"]*).action(?=\sHTTP)

Regular expression image

Edit live on Debuggex

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.