0

How to use the PHP DOM Parser to parse the content of the table, so I get:

  • the username
  • the mobilephone number
  • the status

So the output of what I try to extract would be:

  • randomusername - 0123456789 - active
  • randomusername2 - 0987654321 - active

This is the html i try to parse (some part of it):

...
<div class="table tbl-process-mobile">
  <div class="table-cn">
    <div class="table-bd">
      <table cellspacing="0" id="idd7">

<thead>
    <tr id="idd9">
        <th scope="col">
          <span>username</span>
        </th>
        <th scope="col">
          <span>status</span>
        </th>

        <th scope="col">        
          <span>prefered number</span>
        </th>

        <th scope="col">
          <span>action</span>
        </th>
    </tr>
</thead>

<tbody id="iddb">
    <tr class="even">
        <td class="even">
            <div>randomusername</div>
        </td><td class="odd">
            <div>0123456789</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="iddc" style="display:none"></span>
  <a href="xyz" id="idb2"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddd" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb3" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb4" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr><tr class="odd">
        <td class="even">
            <div>randomusername2</div>
        </td><td class="odd">
            <div>0987654321</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="idde" style="display:none"></span>
  <a href="xyz" id="idb5"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddf" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb6" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb7" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr>
</tbody>
</table>
    </div>
  </div>
</div>
...

I already started with some PHP code:

<?php
error_reporting(0);
$matches = array();
$dom = new DOMDocument;

$dom->loadHTMLFile('settings.html');

How to extract the values, what's the best way to parse the HTML from this point?

4
  • Iterate over each tr. Commented Nov 5, 2016 at 17:18
  • Which value you need .. the ones related to a tag? .. to an id ? .. Commented Nov 5, 2016 at 17:26
  • Sidenote: Should there be any errors somewhere related to either PHP or an incorrect file path, error_reporting(0); won't help. It's best to catch and display during testing. Commented Nov 5, 2016 at 17:28
  • @scaisEdge take a look above, I showed the things I need to extract! Commented Nov 5, 2016 at 17:29

3 Answers 3

2
$field_names = ['username', 'phone', 'status'];
$result = [];

// Search for div tags having tbl-process-mobile class
$containers = $doc->getElementsByTagName('div');
foreach ($containers as $container) {
  if (!isset($container->attributes['class']))
    continue;

  if (false === strpos($container->attributes['class']->value,
    'tbl-process-mobile'))
    continue;

  // Assume that tbody tags are required
  if (!$tbodies = $container->getElementsByTagName('tbody'))
    continue;

  // Get the first tbody (there should not be more)
  if (!$tbodies->length || !$tbody = $tbodies->item(0))
    continue;

  foreach ($tbody->getElementsByTagName('tr') as $tr) {
    $i = 0;
    $row = [];
    $cells = $tr->getElementsByTagName('td');

    // Collect the first count($field_names) cell values as maximum
    foreach ($field_names as $name) {
      if (!$td = $cells->item($i++))
        break;
      $row[$name] = trim($td->textContent);
    }

    if ($row)
      $result []= $row;
  }
}

var_dump($result);

Sample Output

array(2) {
  [0]=>
  array(3) {
    ["username"]=>
    string(14) "randomusername"
    ["phone"]=>
    string(10) "0123456789"
    ["status"]=>
    string(6) "active"
  }
  [1]=>
  array(3) {
    ["username"]=>
    string(15) "randomusername2"
    ["phone"]=>
    string(10) "0987654321"
    ["status"]=>
    string(6) "active"
  }
}

No comments, as the code is self-explanatory.

P.S.: in the sense of parsing, the HTML structure leaves a lot to be desired.

Sign up to request clarification or add additional context in comments.

Comments

0

You can use selector methods of DOMDocument class like getElementById() and getElementsByTag() to find target elements. After finding elements, get text of it and store in array.

$trs = $dom->getElementById("iddb")->getElementsByTagName("tr");
$arr = [];
foreach($trs as $key=>$tr){
    $tds = $tr->getElementsByTagName("td");
    $arr[$key] = [
        $tds->item(0)->textContent,
        $tds->item(1)->textContent,
        $tds->item(2)->textContent
    ];
}

Check result in demo

Also you can use DOMXPath class to find target elements.

$xpath = new DOMXPath($dom);
$trs = $xpath->query("//tbody/tr");

Comments

-1

Try use strip_tags

$html='<div class="table tbl-process-mobile">
  <div class="table-cn">
    <div class="table-bd">
      <table cellspacing="0" id="idd7">

<thead>
    <tr id="idd9">
        <th scope="col">
          <span>username</span>
        </th>
        <th scope="col">
          <span>status</span>
        </th>

        <th scope="col">        
          <span>prefered number</span>
        </th>

        <th scope="col">
          <span>action</span>
        </th>
    </tr>
</thead>

<tbody id="iddb">
    <tr class="even">
        <td class="even">
            <div>randomusername</div>
        </td><td class="odd">
            <div>0123456789</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="iddc" style="display:none"></span>
  <a href="xyz" id="idb2"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddd" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb3" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb4" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr><tr class="odd">
        <td class="even">
            <div>randomusername2</div>
        </td><td class="odd">
            <div>0987654321</div>
        </td><td class="even">
            <div>active</div>
        </td><td class="odd">
            <div>
  <span id="idde" style="display:none"></span>
  <a href="xyz" id="idb5"><span>set number</span></a>
</div>
        </td><td class="even">
            <div>
  <a id="iddf" style="display:none"></a>
  <a href="xyz" class="action-icon-edit" id="idb6" title="change">
    <i>change</i>
  </a>
  <a href="xyz" class="action-icon-delete" id="idb7" title="delete">
    <i>delete</i>
  </a>
</div>
        </td>
    </tr>
</tbody>
</table>
    </div>
  </div>
</div>';

echo strip_tags($html);

Updated:

You parse DOM elements using getElementsByTagName

Read all td

$td=$dom->getElementsByTagName('td');

loop through td and read the div contents

foreach($td as $t){
$div=$t->getElementsByTagName('div');
foreach($div as $d){
    echo $d->textContent;
}

}

Here above will get all the div contents but we only the particular div elements so I suggest you to put some class or data attributes for divs which you want to retrieve. Then put the if condition inside the loop. Here I put data class.

 foreach($td as $t){
$div=$t->getElementsByTagName('div');
foreach($div as $d){
    if($d->getAttribute('class')=='data'){
     echo $d->textContent;
   }

}}

8 Comments

No, this will just be chaos. Why break the structured data!
what you mean?.
I would like to extract the values into values, not this way. This is a chaos as @chris85 mentioned.
There is no organization this way. How will you know (programmatically) what the username is, phone number, etc. This is NOT how parsing is done.
So you want get the particular element's value. Right?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.