3

Using CURL to get content from website. Getting response in object. How to convert that object in to PHP Simple HTML DOM Parser

function get_data($url) 
{
    $ch = curl_init();
    $timeout = 30;
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,false);
    curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
    curl_setopt($ch,CURLOPT_POST,false);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0");
    //curl_exec($ch);
    $dom = new simple_html_dom(curl_exec($ch));
    print_r( $dom );
    curl_close($ch);
    return $data;
}
$url = 'http://www.example.com';
$data = get_data($url);

?>

Result

simple_html_dom Object ( [root] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [4] => 1 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) ) [parent] => [_] => Array ( [0] => -1 [1] => 2 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [4] => 1 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) ) [parent] => [_] => Array ( [0] => -1 [1] => 2 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) [1] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object *RECURSION* ) [parent] => [_] => Array ( [0] => -1 [1] => 2 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) [_] => Array ( [4] => 1 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) ) [callback] => [lowercase] => 1 [original_size] => 1 [size] => 1 [pos:protected] => 1 [char:protected] => [cursor:protected] => 2 [parent:protected] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [4] => 1 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) ) [parent] => [_] => Array ( [0] => -1 [1] => 2 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) [token_blank:protected] => [token_equal:protected] => =/> [token_slash:protected] => /> [token_attr:protected] => > [_charset] => UTF-8 [_target_charset] => UTF-8 [default_br_text:protected] => [default_span_text] => [self_closing_tags:protected] => Array ( [img] => 1 [br] => 1 [input] => 1 [meta] => 1 [link] => 1 [hr] => 1 [base] => 1 [embed] => 1 [spacer] => 1 ) [block_tags:protected] => Array ( [root] => 1 [body] => 1 [form] => 1 [div] => 1 [span] => 1 [table] => 1 ) [optional_closing_tags:protected] => Array ( [tr] => Array ( [tr] => 1 [td] => 1 [th] => 1 ) [th] => Array ( [th] => 1 ) [td] => Array ( [td] => 1 ) [li] => Array ( [li] => 1 ) [dt] => Array ( [dt] => 1 [dd] => 1 ) [dd] => Array ( [dd] => 1 [dt] => 1 ) [dl] => Array ( [dd] => 1 [dt] => 1 ) [p] => Array ( [p] => 1 ) [nobr] => Array ( [nobr] => 1 ) [b] => Array ( [b] => 1 ) [option] => Array ( [option] => 1 ) ) [doc:protected] => 1 [noise:protected] => Array ( ) ) 


2 Answers 2

13

You're not creating the DOM correctly, you must do it like this:

// Create a DOM object
$dom = new simple_html_dom();
// Load HTML from a string
$dom->load(curl_exec($ch))

print_r( $dom );

Check the Manual for more details...

Edit

It seems that is a cURL settings problem, please refer to the documentation to configure it correctly...

This is a function I usualy use to download some pages, feel free to adjust it to your needs:

function dlPage($href) {

    $curl = curl_init();
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($curl, CURLOPT_HEADER, false);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_URL, $href);
    curl_setopt($curl, CURLOPT_REFERER, $href);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
    $str = curl_exec($curl);
    curl_close($curl);

    // Create a DOM object
    $dom = new simple_html_dom();
    // Load HTML from a string
    $dom->load($str);

    return $dom;
    }

$url = 'http://www.example.com/';
$data = dlPage($url);
print_r($data);
Sign up to request clarification or add additional context in comments.

10 Comments

Getting same object like above on result
i also did print_r(curl_exec($ch));. its returning 1;
thank you for update. Still getting an object. print_r(curl_exec($curl)); showing error.. s23.postimg.org/p3xw754t7/main.jpg
@AlizainPrasla, Well, it is working for me... echo gettype(curl_exec($curl)); will give the value string meaning that curl_exec() is returning a string wich is the html code, then when you print it displays the scraped page, and that's what I'm getting... So, are you sure the error comes from here ? try executing that code from a separate file and check the result...
Well descriptive answer. If i change my url to www.example.com so its works exactly what you explained. but if i change my url to vconnect.com/nigeria/businesses-from-a its showing error which i shown you above
|
3

Curl will return a string containing the HTML right? Just use the quick start sample?

$html = str_get_html(curl_exec($ch));

2 Comments

Curl is not returning string.. its a OBJECT. check result
curl_exec will return a string if you set CURLOPT_RETURNTRANSFER to true. If you set it to false, it will output the result directly. In that case, you're getting a boolean telling you if the exec was succesfull.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.