5

I'm trying to create a Regex javascript split, but I'm totally stuck. Here's my input:

9:30 pm
The user did action A.

10:30 pm
Welcome, user John Doe.

***This is a comment

11:30 am
This is some more input.

I want the output array after the split() to be (I've removed the \n for readability):

["9:30 pm The user did action A.", "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30 am This is some more input." ];

My current regular expression is:

var split = text.split(/\s*(?=(\b\d+:\d+|\*\*\*))/);

This works, but there is one problem: the timestamps get repeated in extra elements. So I get:

["9:30", "9:30 pm The user did action A.", "10:30",  "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30", "11:30 am This is some more input." ];

I cant split on the newlines \n because they aren't consistent, and sometimes there may be no newlines at all.

Could you help me out with a Regex for this?

Thanks so much!!

EDIT: in reply to phleet

It could look like this:

9:30 pm
The user did action A.

He also did action B

10:30 pm Welcome, user John Doe.

Basically, there may or may not be a newline after the timestamp, and there may be multiple newlines for the event description.

1
  • Can you provide the kind of input you're talking about in which there are no newlines? Do you mean there are no empty lines, or no newlines? Commented Jun 18, 2010 at 7:42

1 Answer 1

3

I believe the issue is with regards to how Javascript's split treats capturing groups. The solution may just be to use non-capturing group in your pattern. That is, instead of:

/\s*(?=(\b\d+:\d+|\*\*\*))/

Use

/\s*(?=(?:\b\d+:\d+|\*\*\*))/
        ^^

The (?:___) is what is called a non-capturing group.

Looking at the overall pattern, however, the grouping is not actually needed. You should be able to just use:

/\s*(?=\b\d+:\d+|\*\*\*)/

References


Minor point

Instead of \*\*\*, you could use [*]{3}. This may be more readable. The * is not a meta-character inside a character class definition, so it doesn't have to be escaped. The {3} is how you denote "exactly 3 repetition of".

References

Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant, thanks so much! This completely solves the problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.