0

I need to parse this web page https://www.galliera.it/118 getting the numbers under the coloured bars.

This is my code (that doesn't work!!) ...

<?php
    ini_set('display_errors', 1);

    $url = 'https://www.galliera.it/118';

    print "The url ... ".$url;
    echo '<br>';
    echo '<br>';

    //#Set CURL parameters ...
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    //print "Data ... ".$data;
    //echo '<br>';
    //echo '<br>';

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    // This is the xpath for a number under a bar ....
    // /html/body/div[2]/div[1]/div/div/ul/li[6]/span
    // How may I get it?
    // The following code doesn't work, it's only to show my goals ..

    $greenWaitingNumber = $xpath->query('/html/body/div[2]/div[1]/div/div/ul/li[6]/span');
    $theText = (string).$greenWaitingNumber;

    print "Data ... ".$theText;
    echo '<br>';
    echo '<br>';

?>

Any suggestions / examples / alternatives?

5
  • 2
    "that doesn't work" can you be more specific? also (string).$greenWaitingNumber is bad syntax and you can't just echo a DOMElement like that (SimpleXMLElement can when using Simple XML) Commented Feb 23, 2017 at 21:51
  • You're right ... sorry. White page and the web console says "Error 500". I think that the problem is about the ... $theText = (string).$greenWaitingNumber; .... line nut I'm not so sure if the $xpath->query is right (note that I obtained the xpath using "Inspect element" interactive function in the borwser ... Commented Feb 23, 2017 at 21:55
  • 2
    your x-path is good for a specific value because of the index notation, but to get all of them you need something a bit more generic at the start.. /html/body/div/div/div/div/ul/li[6]/span Commented Feb 23, 2017 at 22:00
  • Ok, thanks .. so ... $greenWaitingNumber = $xpath->query('/html/body/div[2]/div[1]/div/div/ul/li[6]/span'); is correct I suppose ... How may I print the $greenWaitingNumber value in this case? Commented Feb 23, 2017 at 22:15
  • 1
    $greenWaitingNumber = $xpath->query('/html/body/div[2]/div[1]/div/div/ul/li[6]/span'); $theText = $greenWaitingNumber[0]->nodeValue; will give you "2" Commented Feb 24, 2017 at 22:17

2 Answers 2

3

Here is your php script that is mining request by you data in nicely sorted array, you can see the results of script and change the structure as you need it. Cheers!

$html = file_get_contents("https://www.galliera.it/118");

$dom = new DOMDocument();
$dom->loadHTML($html);
$finder = new DOMXPath($dom);

// find all divs class row
$rows = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' row ')]");

$data = array();
foreach ($rows as $row) {
    $groupName = $row->getElementsByTagName('h2')->item(0)->textContent;
    $data[$groupName] = array();

    // find all div class box
    $boxes = $finder->query("./*[contains(concat(' ', normalize-space(@class), ' '), ' box ')]", $row);
    foreach ($boxes as $box) {
        $subgroupName = $box->getElementsByTagName('h3')->item(0)->textContent;
        $data[$groupName][$subgroupName] = array();

        $listItems = $box->getElementsByTagName('li');
        foreach ($listItems as $k => $li) {

            $class = $li->getAttribute('class');
            $text = $li->textContent;

            if (!strlen(trim($text))) {
                // this should be the graph bar so kip it
                continue;
            }

            // I see only integer numbers so I cast to int, otherwise you can change the type or event not cast it
            $data[$groupName][$subgroupName][] = array('type' => $class, 'value' => (int) $text);
        }
    }
}

echo '<pre>' . print_r($data, true) . '</pre>';

and output is something like:

Array
(
    [SAN MARTINO - 15:30] => Array
        (
            [ATTESA: 22] => Array
                (
                    [0] => Array
                        (
                            [type] => rosso
                            [value] => 1
                        )

                    [1] => Array
                        (
                            [type] => giallo
                            [value] => 12
                        )

                    [2] => Array
                        (
                            [type] => verde
                            [value] => 7
                        )

                    [3] => Array
                        (
                            [type] => bianco
                            [value] => 2
                        )

                )

            [VISITA: 45] => Array
                (
                    [0] => Array
                        (
                            [type] => rosso
                            [value] => 5
                        )
...
Sign up to request clarification or add additional context in comments.

Comments

3

This might help simplify your xpath statement for this specific instance.

This will find all li elements with a class attribute matching "verde" that has a span element under it.

the // notation means "match at any level in the document" so you don't have to build your query from root

/* @var $node DOMElement */
$greenWaitingNumber = $xpath->query('//li[@class="verde"]/span');
foreach( $greenWaitingNumber as $node )
{
  echo $node->nodeValue;
}

*note this will not deal with class="verde foo bar"


If you're only interested in one particular value...

$greenWaitingNumber = $xpath->query('/html/body/div[2]/div[1]/div/div/ul/li[6]/spa‌​n');
$theText = $greenWaitingNumber[0]->nodeValue;

This will print "2"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.