0

I have a problem:

I have an XML file which contains:

<colortable>
<color id="1" type="transparent"/>
<color id="2"/>
<color id="3" values="1.0"/>
<color id="4" type="rgb" values="0.0,0.0,0.0"/>
<color id="5" type="rgb" values="1.0,1.0,1.0"/>
</colortable>
<imagetable>
<imagedata id="1" source="E:\xml2pdf_universal_physical_layer\tmp\dbres22C79BB2A484491458226919210\r.437248.1395746975.csfolha1v2SemMensagem_Tim.jpg">
</imagedata>
<imagedata id="2" source="E:\xml2pdf_universal_physical_layer\tmp\dbres22C79BB2A484491458226919210\r.42189.1400584131.csfolha2v2fiscal_Tim.jpg">
</imagedata>
<imagedata id="3" source="E:\xml2pdf_universal_physical_layer\tmp\dbres22C79BB2A484491458226919210\r.488328.1422006304.DT1_Image6_T.jpg">
</imagedata>
<imagedata id="4" source="E:\xml2pdf_universal_physical_layer\tmp\dbres22C79BB2A484491458226919210\r.1262464.1427173896.csfolha3v2fiscal_Tim.jpg">
</imagedata>
<imagedata id="5" source="E:\xml2pdf_universal_physical_layer\tmp\dbres22C79BB2A484491458226919210\r.54571.1400584131.csfolha0v2fiscal_Tim.jpg">
</imagedata>
</imagetable>

I want change the path from the one above to C:\images\

I'm trying to use this Powershell code:

while ($line = [Console]::In.ReadLine()) 
{ 
  switch -wildcard ($line) 
  { 
   '<imagedata*' {$line -replace '[A-Z]{1}:.+[r][.]([0-9]+[.]){2}', 'c:\images\'} 
   default {$line}
  }
}

I want this to, for every string starting with <imagedata, find the path (matching a regexp) and replace it with a new path.

This isn't working. How can I fix it?

3
  • `C:\images` instead of what ? Because last part seems to be name of some image file which I think you would want to keep as it is. And what is output of your current code ? Commented Mar 17, 2016 at 16:57
  • Please explain what, exactly, isn't working. Commented Mar 17, 2016 at 17:43
  • Works fine for me. Don't know what you want to keep but it replaces the path (and the "id"-part of the filename) with c:\images\ . The most disturbing thing in the script is [Console]::In.ReadLine()) which I really can't understand why you're using. Commented Mar 17, 2016 at 19:11

1 Answer 1

1

I have not used PowerShell before, but I saw this under the RegEx tag, so I figured I would give it a look. I believe your problem comes from: :.+[r]. This is a greedy operator which could cause some issues for you. Perhaps try the following instead:

while ($line = [Console]::In.ReadLine()) 
{ 
  switch -wildcard ($line) 
  { 
   '<imagedata*' {$line -replace '[A-Za-z]:.+?r\.(\d+\.){2}', 'c:\images\'} 
   default {$line}
  }
}

Adding a question mark (?) makes the + lazy, instead of being greedy, which should then allow you to properly anything up to r.######.#######. (inclusive). I also swapped out your [.] with \. to indicate a little period mark, and I swapped [0-9] with \d (which is just a shorter way to write it). As someone pointed out the comments, though, are you sure you want to replace the r.######.#######. section?

If you end up wanting to keep the r.######.########. part of the filename, this should be accomplishable by using the RegEx positive lookahead operator instead of actually matching it. However, as a warning, some RegEx engine implementations throw a fit when the literal length of the lookahead/lookbehind is not specified/determinable. I am not sure how PowerShell handles varying length lookaheads, but here would be an implementation using that (assuming PowerShell supports it):

while ($line = [Console]::In.ReadLine()) 
{ 
  switch -wildcard ($line) 
  { 
   '<imagedata*' {$line -replace '[A-Za-z]:.+?(?=r\.(\d+\.){2})', 'c:\images\'} 
   default {$line}
  }
}

As an example, in your line that says: E:\xml2pdf_universal_physical_layer\tmp\dbres22C79BB2A484491458226919210\r.437248.1395746975.csfolha1v2SemMensagem_Tim.jpg, rather than replacing: E:\xml2pdf_universal_physical_layer\tmp\dbres22C79BB2A484491458226919210\r.437248.1395746975., the second code would, in theory, only replace: E:\xml2pdf_universal_physical_layer\tmp\dbres22C79BB2A484491458226919210\. Thus, it would preserve the whole filename. Again, this is dependent on PowerShell's support for lookahead and you may actually want to replace the first part of the filename, but I just wanted to throw that in here as an alternative, in case you do actually want to preserve the whole filename.

I hope that helps. Let me know if anything is unclear. You can read more about RegEx, and specifically lookahead and lookbehind, by clicking here (link to regular-expressions.info).

Sign up to request clarification or add additional context in comments.

2 Comments

You may not know PowerShell but your solution is flawless. I just wanted to add a link to a RegEx101 explaination with the sample XML, and your solution in place. Also, if they wanted to keep the beginning if the file names, and had issues with using a lookahead they could use this for the replacement text: 'c:\images\$1'
Well, I'm glad to know it works, @TheMadTechnician lol Thanks for the additional link and suggestion about replacements. I only know RegEx because I spent a few years using it pretty heavily in VB.NET and PHP for scraping, so powershell specific things (suchas 'C:\images\$1') are pretty much lost on me. Glad to know my solution is correct, though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.