0

I am trying to scrape pages like this using Python 3.5. I have scraped its content using BeautifulSoup. I have a problem in scraping the number of sizes. In this specific page the number of sizes is 9 (FR 80 A,FR 80 B,FR 80 C etc). I suppose this information is in json format. I am trying to use json package but I can't find the 'start' and 'end'. My code looks like this:

import requests
import json

page = requests.get('https://www.laperla.com/fr/en/cfiplm000566-bgw532.html')
content = page.text    
start = content.find('spConfig') + ...
end = ...    
data = json.loads(content[start:end])
sizes = data['attributes']['179']['options']
print(len(sizes))

The correct output should be '9', since there are 9 sizes. I don't want to use selenium or such packages. So, which is the correct 'start' and 'end'? Is there a better way to scrape this data than what I am trying to do?

1 Answer 1

1

1 . Iterate all script tags and search target json

2 . Use regex to grab start and end

3 . Use json module

for i in soup.select('script'):
    if 'Product.Config' in str(i):
        data = re.search(r'(?is)(Product\.Config\()(.*?)(\))',str(i)).group(2)

json_data = json.loads(data)
print(len(json_data['attributes']['179']['options']))
9
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.