Scraping in python shows None value [duplicate]

Question

import requests from bs4 import BeautifulSoup as bs import csv

r = requests.get('https://portal.karandaaz.com.pk/dataset/total-population/1000') soup = bs(r.text) table = soup.find_all(class_='ag-header-cell-text')

this give me None value any idea how to scrape data from this site would appreciate.

Paul M. · Accepted Answer · 2021-03-31 16:38:36Z

1

BeautifulSoup can only see what's directly baked into the HTML of a resource at the time it is initially requested. The content you're trying to scrape isn't baked into the page, because normally, when you view this particular page in a browser, the DOM is populated asynchronously using JavaScript. Fortunately, logging your browser's network traffic reveals requests to a REST API, which serves the contents of the table as JSON. The following script makes an HTTP GET request to that API, given a desired "dataset_id" (you can change the key-value pair in the params dict as desired). The response is then dumped into a CSV file:

def main():
    import requests
    import csv

    url = "https://portal.karandaaz.com.pk/api/table"

    params = {
        "dataset_id": "1000"
    }

    response = requests.get(url, params=params)
    response.raise_for_status()

    content = response.json()

    filename = "dataset_{}.csv".format(params["dataset_id"])

    with open(filename, "w", newline="") as file:
        fieldnames = content["data"]["columns"]

        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()

        for row in content["data"]["rows"]:
            writer.writerow(dict(zip(fieldnames, row)))
    
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

answered Mar 31, 2021 at 16:38

Paul M.

10.8k2 gold badges11 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sajjad Ali Over a year ago

Thanks Paul. your code is seem to me very complicated, it gives me an error of "No module named 'env" any idea?

Paul M. Over a year ago

@SajjadAli Sorry about that, I made a mistake copy-pasting my code. The first two lines should not have been there. Refresh the page and try my updated code.

Sajjad Ali Over a year ago

Thanks Paul. You have saved my another day trying to find the solution. Stackoverflow just rock i just sign up and put the question and got the answer what i was looking for.BTW how did you find out about the other url? it goes over my head.

Paul M. Over a year ago

@SajjadAli You're welcome. About the URL, you may want to read this answer I posted for a different question, where someone was trying to scrape information about wines and vineyards. In it, I explain the steps you need to take to log your browser's network traffic, and how to formulate requests to an API.

Justin Bodnar · Accepted Answer · 2021-03-31 16:37:42Z

1

The tag you're searching for isn't in the source code, which is why you're returning no data. Is there some reason you expect this to be there? You may be seeing different source code in a browser than you do when pulling it with the requests library.

You can view the code being pulled via:

    import requests
    from bs4 import BeautifulSoup as bs
    import csv

    r = requests.get('https://portal.karandaaz.com.pk/dataset/total-population/1000')
    soup = bs(r.text, "lxml")
    print( soup )

answered Mar 31, 2021 at 16:37

Justin Bodnar

685 bronze badges

2 Comments

Sajjad Ali Over a year ago

yeah, I was looking in inspect code and it gives me what i wrote in code but you are right the code which i have in jupyter is different don't know how and I am very new to scraping still learning. Thanks

Justin Bodnar Over a year ago

Likely there is some JavaScript executing in your browser to generate this HTML. The answer below has more information on how to work around that.

Collectives™ on Stack Overflow

Scraping in python shows None value [duplicate]

2 Answers 2

4 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Linked

Related