1

According to the previous topic, I am going to trim white space/tab for a string in PHP.

$html = '<tr>       <td>A     </td>                <td>B   </td>      <td>C    </td>       </tr>'

converting to

$html = '<tr><td>A     </td><td>B   </td><td>C    </td></tr>'

How to write the statement likes str.replace(/>\s+</g,'><'); ?

4
  • preg_replace() Commented Aug 2, 2011 at 1:24
  • 1
    @Phil: No. Don't use regex for HTML parsing. Commented Aug 2, 2011 at 1:34
  • @Tomolak In this case I'd say it's valid. The HTML is not being parsed anyway; it's simple string manipulation. Also, it was a direct answer to the question Commented Aug 2, 2011 at 1:38
  • @Tomolak ... further, were it an entire document, I'd definitely agree with you Commented Aug 2, 2011 at 1:46

2 Answers 2

4
$str = preg_replace('/(?<=>)\s+(?=<)/', '', $str);

Less prone to breakage, but uses some more resources:

<?php
$html = '<tr>       <td>A     </td>                <td>B   </td>      <td>C    </td>       </tr>';
$d = new DOMDocument();
$d->loadHTML($html);
$x = new DOMXPath($d);
foreach($x->query('//text()[normalize-space()=""]') as $textnode){
    $textnode->deleteData(0,strlen($textnode->wholeText));
}
echo $d->saveXML($d->documentElement->firstChild->firstChild);
Sign up to request clarification or add additional context in comments.

2 Comments

Not a wholly robust solution, but input-dependent it may be sufficient in practice.
@Tomalak: indeed, somewhat iffy when talking sgml/html/xml of course, breakage possibilities a plenty. I'll offer an alternative.
0

http://sandbox.phpcode.eu/g/54ba6.php

result

<tr><td>A     </td><td>B   </td><td>C    </td></tr>

code

<?php 
$html = '<tr>       <td>A     </td>                <td>B   </td>      <td>C    </td>       </tr>'; 
$html = preg_replace('~(</td>)([\s]+)(<td>)~', '$1$3', $html); 
$html = preg_replace('~(<tr>)([\s]+)(<td>)~', '$1$3', $html); 
echo preg_replace('~(</td>)([\s]+)(</tr>)~', '$1$3', $html);

5 Comments

Thanks for the solution, could you advise what is the meaning of "~" in "'~(</td>)([\s]+)(<td>)~'" and "$1$3"? thanks
@Charles: Substitution. Take a look at the preg_replace manual page, and read a book on regular expressions.
@Charles PHP can use other characters besides the forward-slash as regular expression delimiters. It's useful if your pattern contains forward slashes (as in this case) as you don't need to escape them
@genesis Why capture the whitespace in a character class?
@Phil: Good question. It's redundant and genesis you've done this before!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.