0

I am using the PHP Simple HTML DOM Parser and I am trying to get the table list of Top Goalscorers from this webpage: http://www.transfermarkt.co.uk/en/chinese-super-league/startseite/wettbewerb_CSL.html (it's the top 5...)

I am trying to parse the table Top Goal Scorers and that has the ID of "spieler". In doing so, I want to get each table row and list them on my own. The problem is... below Name / Club... there is a new <table> to make the image, name and club name easier to display on a webpage.

I am trying to figure out the DOM so I can see what I need to select and get the right player name, club name and the goals.

Here's what I have so far:

<textarea id='txt_out'>
<?php
echo "Player | Team | Goals\n:--|:--|:--:\n";
   
$url = "http://www.transfermarkt.co.uk/en/chinese-super-league/startseite/wettbewerb_CSL.html";
$html = file_get_html($url);

foreach($html->find('#spieler') as $row) {
    
  if ($i > 0) {
   $player = $row->find('table tr',3)->plaintext;
        echo $player . "|TEST TEAM|0";
    }
   $i++;
}
?>
</textarea>

and this echo returns blank.

<textarea id="txt_out">Player | Team | Goals
:--|:--|:--:
</textarea>
1
  • Won't $html->find('#spieler') return the table with the id of spieler (ie: an array of one item)? Seems to me that something like #spieler>tbody>tr[class] table tr would get you all (and only all) the rows that have data. Probably won't affect the overall result, but it seems like it'd obviate the need for the counter and all that. Commented May 5, 2013 at 10:59

2 Answers 2

2

There you go (you have to play with the attributes a bit to get your desire output): In this solution I just take all the tds and get the plaintext of the them after I checked they don't include the inner table in them.

$output = '<table border="1">
                <tr>
                    <td>#</td>
                    <td>Player</td>
                    <td>Team</td>
                    <td>goals-1</td>
                    <td>goals-2</td>
                    <td>goals-3</td>
                    <td>points</td>
                </tr>
            ';

$url = "http://www.transfermarkt.co.uk/en/chinese-super-league/startseite/wettbewerb_CSL.html";
$html = file_get_html($url);

$tbl = $html->find('#spieler',0);

$trs = $tbl->find('tr[class=dunkel],tr[class=hell]');

foreach($trs as $tr){
    $output .= '<tr>';
    $tds = $tr->find('td');
    foreach($tds as $td){
        $inner_table = $td->find('table',0);
        if(!$inner_table){  
            $text = trim($td->plaintext);
            if($text != ''){
                $output .= '<td>' . $td->plaintext . '</td>';
            }
        }  
    }
    $output .= '</tr>';
}

$output .= '</table>';

echo($output);
Sign up to request clarification or add additional context in comments.

2 Comments

I'm trying to add it as "markdown" inside a textbox... so each row would be using a delmiter like | and no need for points or assists section... hmmm
jeeesus - so change it to your needs - do you want me to come to your house and feed you with a spoon also ? this is the global solution - whatever you want from here is very easy to format
0

Use DOMNodelist->item() (item() expects as argument the index, it's zero-based so 1 will return the 2nd table )

 $table = $dom->getElementsByTagName('table')->item(1);

1 Comment

Unfortunately, he's using "Simple HTML DOM" (barf). It doesn't return a DOMNodeList; it just returns arrays (or single elements, if you specify an index to find or getElementsByTagName).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.