-1

I'm trying to find the color of a span in a link set in the CSS of the following HTML example using DOMDocument/xPath:

   <html>
      <head>
          <style>
             a span{
                color: #21d;
             }
          </style>
      </head>
      <body>
          <a href='test.html'>this is a <span>test</span></a>
      </body>
   </html>

I can find all CSS with the xPath '//style' ($css = $path->query( '//span' )->nodeValue) and then do something with a pregmatch to get the result, but wonder if there is a way to get this color using xPath, and if so, what that way is.

5
  • 1
    Let's see some code to show us what you've tried. XPath is used to traverse the DOM, it's not a CSS parser. Commented Oct 15, 2018 at 20:21
  • @miken32, I get the CSS with $css = $path->query( '//span' )->nodeValue, like in the question... Commented Oct 15, 2018 at 20:24
  • 1
    @miken32, it's not a duplicate of that post... That post is asking how to make a regexp, this question is about solving it using xpath - if that's even possible Commented Oct 15, 2018 at 20:28
  • 1
    It's not possible, as I said. Commented Oct 15, 2018 at 20:29
  • 1
    You can do it with selenium and using xpath, however like @miken32 said, not with a DomDocument parser or any other php library that uses libxml. They are used for raw parsing of xml Commented Oct 15, 2018 at 20:51

1 Answer 1

1

XPath is not particularly well adapted to this kind of task, but contrary to what's been put forth in the comments it is possible using evaluate() and some nested string functions like substring-before() and substring-after():

$html = '
    <html>
      <head>
          <style>
             a span{
                background-color: #ddd;
                color: #21d;
             }
          </style>
      </head>
      <body>
          <a href="test.html">this is a <span>test</span></a>
      </body>
   </html>
';

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DomXPath($dom);

$result = $xpath->evaluate("
    substring-before(
        substring-after(
            substring-after(
                normalize-space(//style/text())
            , 'a span')
        ,' color:')
    ,';')
");
echo $result;

OUTPUT:

#21d

Working from the inside out:

  1. Normalize whitespace.
  2. Get the part of the style text after your selector.
  3. Get the text after the css rule in question. Notice I added a space before ' color:' to avoid possibly getting background-color or the like. Normalizing the space in step one makes this work even if color: was preceded by a tab.
  4. get the string before the final ; of the color rule.

I'm pretty sure there are a slew of potential points of failure here and I wouldn't recommend using XPath for something like this but it's an interesting exercise all the same.

Sign up to request clarification or add additional context in comments.

1 Comment

wonderful way of thinking outside the box. Something some other people don't seem to be able to do (the, no, not possible, downvote and leave kind)... Thanks for this answer and especially the 'wouldn't recommend"-part. I solved the problem parsing the CSS using a regexp like my first idea was. I can indeed see this solution going wrong if there's another CSS with 'a span' in the ID... Nevertheless thanks for pointing out it can be done

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.