0

This might be very simple. I just want to match all strings within strings, including new line breaks. Example:

textfile:

MESSAGE BEGIN

mary had a little lamb.

little lamb

MESSAGE END

output expectation:

mary had a little lamb.

little lamb

Here is what i currently have. it works okay, except everything is in 1 line.

Code (I currently have):

$pattern= Regex::"MESSAGE BEGIN(.*?)MESSAGE END"

[regex]::Match($text,$pattern).Groups[1].Value

result:

mary had a little lamb.little lamb

I would like it to respect line breaks, so that they are not all crammed together.

8
  • Are you sure that the line breaks are not there? I suggest that maybe they are there, but you just can't see them in the tool you are using. Commented Apr 26, 2018 at 2:33
  • @wp78de But it appears that dot is already matching across newlines. Commented Apr 26, 2018 at 2:34
  • Content comes from a text file, where there is a return, I guess. It matches exactly what I want it to match, but it doesn't respect the newline\break. I am sorry if I am using the wrong term. Commented Apr 26, 2018 at 2:42
  • If the line breaks are in your file, then the should be retained. I guess the problem is the way you read the file. Commented Apr 26, 2018 at 2:53
  • ([\s\S]*?) not quite. but it worked better than others. same output as my original (.*?) Commented Apr 26, 2018 at 2:54

4 Answers 4

1

Use look arounds:

(?<=MESSAGE BEGIN)[\s\S]+(?=MESSAGE END)

Will match any text between (but not including) MESSAGE BEGIN and MESSAGE END.

For discussion of supported regular expresions in Powershell visit: https://blogs.technet.microsoft.com/heyscriptingguy/2016/10/21/powershell-regex-crash-course-part-4-of-5/

Sign up to request clarification or add additional context in comments.

Comments

1

The first part here is to use a pattern like [\s\S]* instead of the . to match newlines too. You want to match lazy+?/*? to avoid to match too much (e.g. from the first MESSAGE BEGIN to the last MESSAGE END if there are multiple message blocks.)

Pattern:

MESSAGE BEGIN([\s\S]*?)MESSAGE END

or if you just want the inner part use look-arounds (still lazy *?):

(?<=MESSAGE BEGIN)[\s\S]*?(?=MESSAGE END)

End-to-end code sample:

$text = [IO.File]::ReadAllText(".\a.txt")

$matches = [regex]::matches($text, "MESSAGE BEGIN([\s\S]*?)MESSAGE END");
ForEach($match in $matches) {
  #Write-Output $match.Value.Trim(); #if you use look-arounds
  Write-Output $match.Groups[1].Value.Trim();
}

1 Comment

thanks so much, MESSAGE BEGIN([\s\S]*?)MESSAGE END did it.
0
MESSAGE BEGIN(\s|\S)*MESSAGE END

(.*?) matches all characters, except for line terminators.

\s matches any whitespace character (equal to [\r\n\t\f\v ])

\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])

Include a bar | in the capture group to match either \s or \S

Then a star * after the capture group to match zero to unlimited characters

Link to example

Comments

0

I've created an example in javascript.

const texto = `
MESSAGE BEGIN

mary had a little lamb.

little lamb

MESSAGE END
`

const regex = /MESSAGE\sBEGIN[\s\S]*MESSAGE\sEND/gi

console.log(texto.match(regex))

The output is:
[ 'MESSAGE BEGIN\n\nmary had a little lamb.\n\nlittle lamb\n\nMESSAGE END' ]

The breaklines were kept.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.