18

I apologize if this question has been answered elsewhere but I have been unsuccessful in finding a satisfactory answer here or elsewhere.

I am somewhat new to python and pandas and having some difficulty getting HTML data into a pandas dataframe. In the pandas documentation it says .read_html() returns a list of dataframe objects, so when I try to do some data manipulation to get rid of the some samples I get an error.

Here is my code to read the HTML:

df = pd.read_html('http://espn.go.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)

Then I try to clean it up:

df = df.dropna(axis=0, thresh=4)

And I received the following error:

Traceback (most recent call last): File "module4.py", line 25, in
<module> df = df.dropna(axis=0, thresh=4) AttributeError: 'list'
object has no attribute 'dropna'

How do I get this data into an actual dataframe, similar to what .read_csv() does?

2 Answers 2

27

From https://pandas.pydata.org/pandas-docs/version/0.17.1/io.html#io-read-html, read_html returns a list of DataFrame objects, even if there is only a single table contained in the HTML content".

So df = df[0].dropna(axis=0, thresh=4) should do what you want.

Sign up to request clarification or add additional context in comments.

Comments

11

pd.read_html returns you a list with one element and that element is the pandas dataframe, i.e.

df = pd.read_html(url) ###<-- List

df[0] ###<-- Pandas DataFrame

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.