2

Using powershell, but open to other potential solutions....

I have a long string. I need to replace several sequences of characters by position in that string with a mask character (period or space). I don't know what those characters are going to be, but I know they need to be something else. I have written code using mid and iterating through the string using mid and position numbers, but that is a bit cumbersome and wondering if there is a faster/more elegant method.

Example: Given the 2 strings:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
12345678901234567890123456

I want to replace characters 2-4, 8-9, 16-22, & 23 with ., yielding:

A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6

I can do that with a series of MID's, but I was just wanting to know if there were some sort of faster masking function to make this happen. I have to do this through millions of rows and second count.

3 Answers 3

3

Try this:

$regex = [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'

('ABCDEFGHIJKLMNOPQRSTUVWXYZ',
 '12345678901234567890123456') -Replace $regex,$replace

A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6

The -replace operator is slower than string.replace() for a single operation, but has the advantage of being able to operate on an array of strings, which is faster than the string method plus a foreach loop.

Here's a sample implementation (requires V4):

$regex =  [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'

filter fix-file {
 $_ -replace $regex,$replace | 
 add-content "c:\mynewfiles\$($file.name)"
}

get-childitem c:\myfiles\*.txt -PipelineVariable file |
 get-content -ReadCount 1000 | fix-file 

If you want to use the mask method, you can generate $regex and $replace from that:

$mask  = '-...----..------.....---.-'

 $regex = [regex]($mask -replace '(-+)','($1)').replace('-','.')

 $replace = 
 ([char[]]($mask -replace '-+','-') |
  foreach {$i=1}{if ($_ -eq '.'){$_} else {'$'+$i++}} {}) -join ''

$regex.ToString()
$replace

(.)...(....)..(......).....(...).(.)
$1...$2..$3.....$4.$5
Sign up to request clarification or add additional context in comments.

Comments

2

Here another approach:

C:\PS> $mask ="-...----..------.....---.-"
C:\PS> ([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ' | % {$i=0}{if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''

A...EFGH..KLMNOP.....VWX.Z

And if we are going to take advantage of V4 features :-), try this:

C:\PS> $i=0;([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ').Foreach({if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''

3 Comments

Good lord that seems overly complicated. By the way, you can also create a char array containing A to Z with the following: [char[]]([char]'A'..[char]'Z')
Really liked the explicit mask. Made a variation on this using string formatting.
Yeah I didn't think it was that complicated. A simple loop over each character and test for mask or not seems more like CS 101. :-)
2

Here yet another approach:

C:\PS> $mask = "{0}...{4}{5}{6}{7}..{10}{11}{12}{13}{14}{15}.....{21}{22}{23}.{25}"
C:\PS> $singlecharstrings = [string[]][char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
C:\PS> $mask -f $singlecharstrings

A...EFGH..KLMNOP.....VWX.Z

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.