2

import requests from bs4 import BeautifulSoup as bs import csv

r = requests.get('https://portal.karandaaz.com.pk/dataset/total-population/1000') soup = bs(r.text) table = soup.find_all(class_='ag-header-cell-text')

this give me None value any idea how to scrape data from this site would appreciate.

0

2 Answers 2

1

BeautifulSoup can only see what's directly baked into the HTML of a resource at the time it is initially requested. The content you're trying to scrape isn't baked into the page, because normally, when you view this particular page in a browser, the DOM is populated asynchronously using JavaScript. Fortunately, logging your browser's network traffic reveals requests to a REST API, which serves the contents of the table as JSON. The following script makes an HTTP GET request to that API, given a desired "dataset_id" (you can change the key-value pair in the params dict as desired). The response is then dumped into a CSV file:

def main():
    import requests
    import csv

    url = "https://portal.karandaaz.com.pk/api/table"

    params = {
        "dataset_id": "1000"
    }

    response = requests.get(url, params=params)
    response.raise_for_status()

    content = response.json()

    filename = "dataset_{}.csv".format(params["dataset_id"])

    with open(filename, "w", newline="") as file:
        fieldnames = content["data"]["columns"]

        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()

        for row in content["data"]["rows"]:
            writer.writerow(dict(zip(fieldnames, row)))
    
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Paul. your code is seem to me very complicated, it gives me an error of "No module named 'env" any idea?
@SajjadAli Sorry about that, I made a mistake copy-pasting my code. The first two lines should not have been there. Refresh the page and try my updated code.
Thanks Paul. You have saved my another day trying to find the solution. Stackoverflow just rock i just sign up and put the question and got the answer what i was looking for.BTW how did you find out about the other url? it goes over my head.
@SajjadAli You're welcome. About the URL, you may want to read this answer I posted for a different question, where someone was trying to scrape information about wines and vineyards. In it, I explain the steps you need to take to log your browser's network traffic, and how to formulate requests to an API.
1

The tag you're searching for isn't in the source code, which is why you're returning no data. Is there some reason you expect this to be there? You may be seeing different source code in a browser than you do when pulling it with the requests library.

You can view the code being pulled via:

    import requests
    from bs4 import BeautifulSoup as bs
    import csv

    r = requests.get('https://portal.karandaaz.com.pk/dataset/total-population/1000')
    soup = bs(r.text, "lxml")
    print( soup )

2 Comments

yeah, I was looking in inspect code and it gives me what i wrote in code but you are right the code which i have in jupyter is different don't know how and I am very new to scraping still learning. Thanks
Likely there is some JavaScript executing in your browser to generate this HTML. The answer below has more information on how to work around that.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.