1

I recently started learning python and one of the first projects I did was to scrap updates from my son's classroom web page and send me notifications that they updated the site. This turned out to be an easy project so I wanted to expand on this and create a script that would automatically check if any of our lotto numbers hit. Unfortunately I haven't been able to figure out how to get the data from the website. Here is one of my attempts from last night.

from bs4 import BeautifulSoup
import urllib.request

webpage = "http://www.masslottery.com/games/lottery/large-winningnumbers.html"

websource = urllib.request.urlopen(webpage)
soup = BeautifulSoup(websource.read(), "html.parser")

span = soup.find("span", {"id": "winning_num_0"})
print (span)

Output is here...
<span id="winning_num_0"></span> 

The output listed above is also what I see if I "view source" with a web browser. When I "inspect Element" with the web browser I can see the winning numbers in the inspect element panel. Unfortunately I'm not even sure how/where the web browser is getting the data. is it loading from another page or a script in the background? I thought the following tutorial was going to help me but I wasn't able to get the data using similar commands.

http://zevross.com/blog/2014/05/16/using-the-python-library-beautifulsoup-to-extract-data-from-a-webpage-applied-to-world-cup-rankings/

Any help is appreciated. Thanks

5
  • 1
    if the content is dynamic, you might need an approach based on, e.g., Selenium - selenium-python.readthedocs.io/api.html Commented Sep 15, 2016 at 12:21
  • Possible duplicate of Reading dynamically generated web pages using python Commented Sep 15, 2016 at 12:24
  • Checking from the developer console what that page does, it loads the data dynamically from here: masslottery.com/data/json/games/lottery/recent.json So you could just write a script that loads that json-formatted data and checks the numbers from there. A lot easier than scraping html ;) Commented Sep 15, 2016 at 12:25
  • Selenium is definitely the approach that I would recommend in most cases, but you're lucky here - the static approach is actually even easier than what you were trying to do in the first place :) Commented Sep 15, 2016 at 12:36
  • Thanks for the quick replies. I will try both the static and dynamic approach since this is more of a learning project. Commented Sep 15, 2016 at 12:55

1 Answer 1

2

If you look closely at the source of the page (I just used curl) you can see this block

<script type="text/javascript">
    // <![CDATA[
    var dataPath = '../../';
    var json_filename = 'data/json/games/lottery/recent.json';
    var games = new Array();
    var sessions = new Array();
    // ]]>
</script>

That recent.json stuck out like a sore thumb (I actually missed the dataPath part at first).

After giving that a try, I came up with this:

curl http://www.masslottery.com/data/json/games/lottery/recent.json

Which, as lari points out in the comments, is way easier than scraping HTML. This easy, in fact:

import json
import urllib.request
from pprint import pprint

websource = urllib.request.urlopen('http://www.masslottery.com/data/json/games/lottery/recent.json')
data = json.loads(websource.read().decode())
pprint(data)

data is now a dict, and you can do whatever kind of dict-like things you'd like to do with it. And good luck ;)

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. i will try this tonight!
For added fun, you could always use python's random module to guess lotto numbers and see how much money it would make you.
Your solution worked. Now I need to figure out how to easily extract the information from the dictionary since it is multi-level.
If it worked, then you should mark this as accepted by clicking the green checkmark to the left <---. For multi-level dictionaries you can simply chain []s, e.g. data['foo']['bar']

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.