0

I´m trying to read a table from a HTML file into an array, I'm stuck. Any help would be appreciated.

Every table element should be stored into 1 array value

example: $arr[1]= DER HE1 ges 1

PHP

<?php
      libxml_use_internal_errors(true);
      $i=0;
      // new dom object  
      $dom = new DOMDocument();  

      //load the html  
      $html = $dom->loadHTMLFile("106642new.html");  

      //discard white space   
      $dom->preserveWhiteSpace = false;   

      //the table by its tag name  
      $tables = $dom->getElementsByTagName('table');   

      //get all rows from the table  
      $rows = $tables->item(0)->getElementsByTagName('tr');   
      // $test = $tables->item(0)->getElementsByTagName('td');   

      // loop over the table rows  
      foreach ($rows as $row) {
          // get each column by tag name  
          $cols = $row->getElementsByTagName('td');  
          $i= $i + 1 ;
          $value = "Nummer: ".$i.":  ".$cols->item(0)->nodeValue.PHP_EOL;
          // $value = "test: ".$i.":  ".$cols->item(0)->nodeValue.PHP_EOL;
          $cols = array(1, 2, 3, 4, 5);
          echo $value;
          //  $cols[$i] = $row; 
          // echo the values    
          //echo $cols->item(0)->nodeValue ; 
      }   
?>

HTML:

<body bgcolor="#FFFFFF" topmargin="0" leftmargin="0" marginwidth="0" marginheight="0">

          <div align=left>

          <table BORDER=0 CELLSPACING=0 CELLPADDING=0 WIDTH="100%" height="100%">

          <tr><td valign="top">&nbsp</td></tr>

          <tr><td valign="top">

          <p font class="Header">Basisrooster schooljaar 2011 2012 (m.i.v. 12-09-11)</font></p>
          <br><div font class="lNameHeader"> </font> </div><table border=1>
          <tr class="AccentDark">
           <td align="left" width="65" class="tableHeader"></td>
           <td align="center" width="auto" class="tableHeader">Maandag</td>
           <td align="center" width="auto" class="tableHeader">Dinsdag</td>
           <td align="center" width="auto" class="tableHeader">Woensdag</td>
           <td align="center" width="auto" class="tableHeader">Donderdag</td>
           <td align="center" width="auto" class="tableHeader">Vrijdag</td>
          </tr><tr>
           <td align="left" width="50" class="tableHeader">1e uur</td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell"></td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell"></td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell"></td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">WAS</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HE09</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">econ</td>
           <td align="left" width="9" class="tableCell">5</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">WIK</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HC17</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">biol</td>
           <td align="left" width="9" class="tableCell">4</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">OTT</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HC01</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">dutl</td>
           <td align="left" width="9" class="tableCell">6</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell"></td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell"></td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell"></td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
          </tr>
          <tr>
           <td align="left" width="50" class="tableHeader">2e uur</td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">KEJ</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HC02</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">wisA</td>
           <td align="left" width="9" class="tableCell">3</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">BRT</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HE05</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">netl</td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">OTT</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HC01</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">dutl</td>
           <td align="left" width="9" class="tableCell">6</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">BAU</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HG01</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">lo</td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">MET</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HD02</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">entl</td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
          </tr>
          <tr>
           <td align="left" width="50" class="tableHeader">3e uur</td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">WAS</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HE07</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">econ</td>
           <td align="left" width="9" class="tableCell">5</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">MET</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HD02</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">entl</td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">WAS</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HE05</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">econ</td>
           <td align="left" width="9" class="tableCell">5</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">BAU</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HG01</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">lo</td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">KEJ</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HC02</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">wisA</td>
           <td align="left" width="9" class="tableCell">3</td>
          </tr>
          </table>
          </td>
          </tr>
          <tr>
           <td align="left" width="50" class="tableHeader">4e uur</td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell"></td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell"></td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell"></td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">DER</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HE08</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">ges</td>
           <td align="left" width="9" class="tableCell">1</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">KEJ</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HC06</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">wisA</td>
           <td align="left" width="9" class="tableCell">3</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">DER</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HE10</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">ges</td>
           <td align="left" width="9" class="tableCell">1</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">CHR</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HB15</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">ckv</td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
          </tr>
          <tr>
           <td align="left" width="50" class="tableHeader">5e uur</td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">DOC</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HE09</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">m&o</td>
           <td align="left" width="9" class="tableCell">2</td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell"></td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell"></td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell"></td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">MET</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HD02</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">entl</td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">BRT</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HE05</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">netl</td>
           <td align="left" width="9" class="tableCell"></td>
          </tr>
          </table>
          </td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">OTT</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HC03</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">dutl</td>
           <td align="left" width="9" class="tableCell">6</td>
          </tr>
          </table>
          </td>
          </tr>
          <tr>
           <td align="left" width="50" class="tableHeader">6e uur</td>
           <td align="left" width="auto" class="tableCell"><table border="0" cellpadding="0" cellspacing="0" >
          <tr>
           <td align="left" width="41" class="tableCell">OTT</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="75" class="tableCell">HC03</td>
           <td align="left" width="3" class="tableCell">&nbsp</td>
           <td align="left" width="73" class="tableCell">dutl</td>
           <td align="left" width="9" class="tableCell">6</td>
          </tr>
          </table>
          </td>
2
  • 1
    Why are you stuck? Is there an error message etc? it's usually more rewarding + educational to solve these kind of problems yourself. Commented Sep 15, 2011 at 14:18
  • Thanks for your comment, yes I tried it myself but can't get it solved. The problem is that my output is not nummer1: BAU HG01 lo nummer2: DEN HG01 lo ...but it skips numbers and throws multiple elements in one value like number22: DER HG01 lo DAVE H48A GS Commented Sep 15, 2011 at 14:21

2 Answers 2

1

If think the problem is that your first table is a container of other tables. If you want to get the contents of all the tables, than you should also iterate through the tables list.

If you just want to get the contents of a inner table, than first try to locate it in the DOM. I suggest finding the first table, than geting all table elements inside that and iterate through them.

var_dump is a good starting point for debugging, you don't need anything else than you already did, just debug and test more :)

Sign up to request clarification or add additional context in comments.

Comments

0

I'm guessing that the fact that it's invalid HTML/XML is screwing you over.

You're using the loadHTMLFile() function which might support malformed HTML up to an extent, but it might also need valid HTML/XML.

If it requires valid XML, then what's probably happening is that the "<br>" doesn't get interpreted as a stand-alone node, but rather as the starting point of a node... meaning that everything after that becomes sub-nodes of "<br>".

Furthermore this line here doesn't make any sense:

<p font class="Header">Basisrooster schooljaar 2011 2012 (m.i.v. 12-09-11)</font></p>

The <font> tag has been obsolete for years and should never be used, but more importantly it's not a font tag but a p-tag, that still also gets closed as if it's a font-tag. Just do:

<p class="Header">Basisrooster schooljaar 2011 2012 (m.i.v. 12-09-11)</p>

So the solution may be that your HTML/XML is invalid.

(Dan Bizdadea also has a good point.)

1 Comment

Yeah I didn't write the table, otherwise is would be easier to extract the data. @Dan could you give an example of how to do that ? I have no clue what to change in my code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.