1

I typically use the following template script to download data from a website:

import urllib.request as web
from bs4 import BeautifulSoup
...
url_to_visit ='http://www.website-link-to-download-data'
source_code =  web.urlopen(url_to_visit).read()
source_code = ''.join(map(chr, source_code)
source_code = source_code.split('\n')
## then further process the lines returned in `source_code` as needed

But sometimes I come across very difficult sites.

Consider the site: https://www.spice-indices.com/idp2/Main#home. Suppose from the first table Intraday Alerts - United States, I want to download via Python script the information that is displayed when I click the SP TMI tab.

I looked at the output of the splitSource above, but I couldn't figure out how to extract the information I want. It seems to be using Javascript backend to display the information. Can someone give me any pointers or suggestions?

I am using Python 3.x.

1 Answer 1

1

When you activate the "SP TMI" tab there is a POST request send to "intraday-announcements.json" endpoint - simulate that in your code and parse the JSON response.

Sample working code using requests:

import requests

with requests.Session() as session:
    session.get("https://www.spice-indices.com/idp2/Main#home")

    response = session.post("https://www.spice-indices.com/idp2/intraday/effectivedate/11-14-2015/intraday-announcements.json", data={
        "start": "0",
        "limit": "10",
        "indexKey": "SPUSA-TMI-USDUF--P-US----"
    })

    data = response.json()["widget_data"]
    for item in data:
        print(item["EVENT_NAME"])

Prints:

Dividend
Weekly Share Change
Special Dividend
Merger/Acquisition
Merger/Acquisition
Drop
Merger/Acquisition
Merger/Acquisition
Drop
Identifier Changes

Note that the effective date is actually inside the URL, see the 11-14-2015 part.

Sign up to request clarification or add additional context in comments.

2 Comments

one quick question for this page: http://www.ftse.com/products/index-notices/home/getnotices/?id=GEISAC&title=, how do I find the json or the equivalent request?
@uday glad to help. Just use browser developer tools and inspect what requests are made. After a quick look, I think there is a GET request made to http://www.ftse.com/products/index-notices/Backend/GetNotices endpoint that contains the list of notices on a page. If you have difficulties getting the desired data, consider making a new separate question so that more people can help. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.