downloading data using python from a website that uses javascript to display information

Question

I typically use the following template script to download data from a website:

import urllib.request as web
from bs4 import BeautifulSoup
...
url_to_visit ='http://www.website-link-to-download-data'
source_code =  web.urlopen(url_to_visit).read()
source_code = ''.join(map(chr, source_code)
source_code = source_code.split('\n')
## then further process the lines returned in `source_code` as needed

But sometimes I come across very difficult sites.

Consider the site: https://www.spice-indices.com/idp2/Main#home. Suppose from the first table Intraday Alerts - United States, I want to download via Python script the information that is displayed when I click the SP TMI tab.

I looked at the output of the splitSource above, but I couldn't figure out how to extract the information I want. It seems to be using Javascript backend to display the information. Can someone give me any pointers or suggestions?

I am using Python 3.x.

alecxe · Accepted Answer · 2015-11-15 03:58:07Z

1

When you activate the "SP TMI" tab there is a POST request send to "intraday-announcements.json" endpoint - simulate that in your code and parse the JSON response.

Sample working code using requests:

import requests

with requests.Session() as session:
    session.get("https://www.spice-indices.com/idp2/Main#home")

    response = session.post("https://www.spice-indices.com/idp2/intraday/effectivedate/11-14-2015/intraday-announcements.json", data={
        "start": "0",
        "limit": "10",
        "indexKey": "SPUSA-TMI-USDUF--P-US----"
    })

    data = response.json()["widget_data"]
    for item in data:
        print(item["EVENT_NAME"])

Prints:

Dividend
Weekly Share Change
Special Dividend
Merger/Acquisition
Merger/Acquisition
Drop
Merger/Acquisition
Merger/Acquisition
Drop
Identifier Changes

Note that the effective date is actually inside the URL, see the 11-14-2015 part.

answered Nov 15, 2015 at 3:58

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

uday Over a year ago

one quick question for this page: http://www.ftse.com/products/index-notices/home/getnotices/?id=GEISAC&title=, how do I find the json or the equivalent request?

alecxe Over a year ago

@uday glad to help. Just use browser developer tools and inspect what requests are made. After a quick look, I think there is a GET request made to http://www.ftse.com/products/index-notices/Backend/GetNotices endpoint that contains the list of notices on a page. If you have difficulties getting the desired data, consider making a new separate question so that more people can help. Thanks!

Collectives™ on Stack Overflow

downloading data using python from a website that uses javascript to display information

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related