0

I'm trying to scrape the content from the following website:

https://mobile.admiral.at/en/event/event/all#/event/15a822ab-84a1-e511-90a2-000c297013a7

I have previously scraped the content successfully using dryscrape and the following code:

import dryscrape
import webkit_server
from lxml import html

session = dryscrape.Session()
session.set_timeout(20)
session.set_attribute('auto_load_images', False)
session.visit('https://mobile.admiral.at/en/event/event/all#/event/15a822ab-84a1-e511-90a2-000c297013a7')
response = session.body()
tree = html.fromstring(response)

print(tree.xpath('(//td[@class="team-name"]/text())[1]'))

The above example would print the home team (which in this case would be 'France')

It seems that the structure of the source has been changed, so I'm unable to scrape the contents properly.

What confuses me is that I'm able to see the tags using the Firefox Inspector tool, however it's not visible in the response when I pull the source.

I assume they must have hidden the content somehow to make it impossible (?) to scrape the data.

Could someone please point me in the right direction how to scrape the content properly.

1 Answer 1

1

The content that you need is loaded using jQuery (Ajax). I don't know if dryscrape has been updated lately, but the last time I used it didn't support ajax content loaded from jQuery...

Anyway.. just taking a look to the network inspector of chrome you will realize that the main content is loaded using an API. You can call to that API directly and you will get an awesome JSON with all the data of the page:

import requests
data = requests.get('https://mobile.admiral.at/;apiVer=json;api=main;jsonType=object;apiRw=1/en/api/event/get-event?id=15a822ab-84a1-e511-90a2-000c297013a7').json()
Sign up to request clarification or add additional context in comments.

2 Comments

I have exactly same problem with [this Website][1] . I can see the entire text via 'inspect element'. But cannot use selenium (python) to extract the text. Any idea how to overcome? Thanks in advance [1]: pib.nic.in/PressReleseDetail.aspx?PRID=1573651
Exactly like the prevous post. just taking a look to the network inspector you will see that the url which load the content is this one: pib.gov.in/PressReleasePage.aspx?PRID=1573651 which is launched via Ajax

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.