0

I'm trying to scrape this web page ...

http://prontosoccorso.usl4.toscana.it/attesa/home.asp

enter image description here

using PHP and XPath to get the number values under the red, yellow, green and white colored circles.

(NOTE: you could see different value in that page if you try to browse it ... it doesn't matter ..,, it change dinamically .... )

I'm trying to use this PHP code sample to print the value ...

<?php
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    $url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';

    $xpath_for_parsing = '[@id="prontosoccorso"]/tbody/tr[2]/td[2]';

    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');

    $data = curl_exec($ch);
    curl_close($ch);

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    $colorWaitingNumber = $xpath->query($xpath_for_parsing);
    $theValue =  'N.D.';
    foreach( $colorWaitingNumber as $node )
    {
      $theValue = $node->nodeValue;
    }

    print $theValue;
?>

The code works fine but the result is always 0 !!

I've notice that if you use

    $xpath_for_parsing = '[@id="prontosoccorso"]';

the result is

Situazione aggiornata al giorno 30/12/2017 alle ore 14:09 Rosso Giallo Verde Azzurro Bianco Pazienti in attesa (totale 0) 0 0 0 0 0 Pazienti in visita (totale 0) 0 0 0 0 0 Pazienti trattati nelle ultime ore 0 0 0 0 0

so the result 0 for my values is coherent (and also if you try the following curl http://prontosoccorso.usl4.toscana.it/attesa/home.aspfrom command line you note that the values are all zero .... )

Analyzing with browser console I can't found the request that get tha real values ..... Any help / suggestions?

Thank you in advance .. .

1 Answer 1

1

One thing to notice is that even if you go to that web page, you start off with 0's in all the fields, which is why I tried with loading the page twice. This still didn't work, so I then made it store the cookies between calls and the values start to turn up.

The code is mainly what you have, there are extra curl_setopt() calls to create a cookie file (may be able to do this once and that will always work - don't quote me on that).

The XPath, will only fetch the first row of fields, but this can be easily adapted for the other rows.

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);

$url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';

//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$cookies = "./cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);

$data = curl_exec($ch);
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);
$xpath_for_parsing = '//table[@id="prontosoccorso"]/tbody/tr[2]/td';

$colorWaitingNumber = $xpath->query($xpath_for_parsing);

$theValue =  'N.D.';
foreach( $colorWaitingNumber as $node )
{
    echo $theValue = $node->nodeValue.PHP_EOL;
}

You may be able to add some logic that checks if all values are 0 to reload the page. But this code just calls curl_exec() twice.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.