1

I have a problem using a Javascript-Regexp.

This is a very simplified regexp, which demonstrates my Problem:

(?:\s(\+\d\w*))|(\w+)

This regex should only match strings, that doesn't contain forbidden characters (everything that is no word-character).

The only exception is the Symbol +
A match is allowed to start with this symbol, if [0-9] is trailing. And a + must not appear within words (44+44 is not a valid match, but +4ad is)

In order to allow the + only at the beginning, I said that there must be a whitespace preceding. However, I don't want the whitespace to be part of the match.

I tested my regex with this tool: http://regex101.com/#javascript and the resultig matches look fine.

There are 2 Issues with that regexp:

  • If I use it in my JS-Code, the space is always part of the match
  • If +42 appears at the beginning of a line, it won't be matched

My Questions:

  • How should the regex look like?
  • Why does this regex add the space to the matches?

Here's my JS-Code:

var input =  "+5ad6  +5ad6 sd asd+as +we";
var regexp = /(?:\s(\+\d\w*))|(\w+)/g;
var tokens = input.match(regexp);
console.log(tokens);
2
  • 2
    The space: see stackoverflow.com/questions/7505762/… -- using a non-capturing group does not exclude it from the total match, only from adding it to individual matching groups. Commented May 11, 2014 at 12:27
  • @Jongware so is there a way to extract the token without the space? Unfortunately, JS doesn't support lookbehinds Commented May 11, 2014 at 12:45

1 Answer 1

2

How should the regex look like?

You've got multiple choices to reach your goal:

  • It's fine as you have it. You might allow the string beginning in place of the whitespace as well, though. Just get the capturing groups (tokens[1], tokens[2]) out of it, which will not include the whitespace.
  • If you didn't use JavaScript, a lookbehind could help. Unfortunately it's not supported.
  • Require a non-word-boundary before the +, which would make every \w character before the + prevent the match:

    /\B\+\d\w+|\w+/
    

Why does this regex add the space to the matches?

Because the regex does match the whitespace. It does not add the \s(\+\d\w+) to the captured groups, though.

Sign up to request clarification or add additional context in comments.

7 Comments

The regex can even be improved: (\B\+\d)?\w+
Yes, or (?:\B\+\d)?\w+. I just tried not to make a large change to the original expression.
I take it back - this simplification doesn't work, because it would force tokens starting with a + to have at least 2 characters afterwards
But doesn't the original expression do the same? Or did you mean \+\d\w*
Yes, I meant \+\d\w*
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.