1

I am using pandas to grab some ice hockey stats from a web page as shown below:

import pandas as pd

url_goal = 'http://www.quanthockey.com/nhl/records/nhl-players-all-time-goals-per-game-leaders.html'
df_goal = pd.read_html(url_goal, index_col=0, header=0)[0]

This works great, but the problem is that switching to the second page of the stats table on the homepage, does not change the url, so I cannot use the same approach to grab more than the top 50 players. There is a javascript address to the table that does change as the page number switches. I read a little about selenium and beautifulsoup, but I don't have these installed so I would prefer to do it without them is possible. So my question is two-fold:

  1. Is there any way to grab data from the different pages in this javascript table using only pandas and standard Python/SciPy libraries (Anaconda to be exact)?

  2. If not, how would you go about getting this data into a pandas data frame with the help of selenium or your package of choice?

1 Answer 1

3

Hint: Open the network analyzer in your browser and watch what happens when you navigate to different pages; you'll notice a GET request to a page like

http://www.quanthockey.com/scripts/AjaxPaginate.php?cat=Records&pos=Players&SS=&af=0&nat=alltime&st=reg&sort=goals-per-game&page=3&league=NHL&lang=en&rnd=451318572

Notice the page part of the query string.

You can just iterate through the range of numbers corresponding to how many pages there are, changing the query string page parameter, increasing it by one each time (for example)

Sign up to request clarification or add additional context in comments.

3 Comments

This works great thank you! Very useful tip overall, I wasn't aware of the network analyzer. Out of curiosity, do you know the purpose of that last random string of numbers? I am not including it and it is working just fine.
Yes the network analyzer is quite useful - most of the time it can help in coming up with a strategy. Not sure what the rnd parameter is; presumably it serves some purpose or else it wouldn't be there - maybe some kind of internal record keeping.
Hi @Ryan , I have a link - quanthockey.com/khl/seasons/2020-21-khl-players-stats.html , I'm able to convert to csv using pandas but I have to change the code for every year like for 2020-21 , 2019-2020 and so on. Is there any way where I can get all the available data without every time changing the year in url

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.