I wonder if there is a way to extract a dataframe from a nested Html structure/ code like the Pandas read_html() method.
Here is the html from which i need to extract all columns that come after the column "action". Data.html:
<table border="1"><tr><th>Central Repository</th><td><table border="1"><tr><th>Passadena-USA</th><td><table border="1"><tr><th>Fairfax Av.</th><td><table border="1"><tr><th>CMS</th><td><table border="1"><tr><th>action</th><th>address</th><th>machinie_id</th><th>portal</th><th>supplier</th><th>created_by</th><th>date</th><th>portal deficit</th><th>Load Value 1</th><th>Load Value 2</th><th>Load Value 3</th><th>Load Value 4</th><th>Load Value 5</th><th>Sub Load 1</th><th>Sub Load 2</th><th>Sub Load 3</th><th>Sub Load 4</th><th>Sub Load 5</th><th>Coordinates</th><th>Area Code</th><th>pending case id</th><th>project details</th><th>identification number APAC</th><th>site_id</th><th>state</th><th>status</th><th>timestamp</th></tr><tr><td>FP</td><td>1195 Fairfax Avenue </td><td>ZEBA 5841</td><td>NHE-9850</td><td>CMS</td><td>Administrator</td><td>2017/6/19</td><td>687965</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>Relay 4-12 Avery J</td><td>Tonal One B</td><td>2602700</td><td>Tertiary Node</td><td>0</td><td>Volume Sub < 1</td><td>passadena</td><td>PA</td><td>2017/06/19 17:35:56</td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table>
Here is my Python Code:
import pandas as pd
df = pd.read_html(Data.html)
print(df[3])
# shouldn't the index 3 return all the columns that come after "CMS"
i.e... columns : action,address, machine_id, portal.....till timestamp
Here is the snap of my html page:
