Scrape content in json format - Python

Question

I am trying to scrape pages like this using Python 3.5. I have scraped its content using BeautifulSoup. I have a problem in scraping the number of sizes. In this specific page the number of sizes is 9 (FR 80 A,FR 80 B,FR 80 C etc). I suppose this information is in json format. I am trying to use json package but I can't find the 'start' and 'end'. My code looks like this:

import requests
import json

page = requests.get('https://www.laperla.com/fr/en/cfiplm000566-bgw532.html')
content = page.text    
start = content.find('spConfig') + ...
end = ...    
data = json.loads(content[start:end])
sizes = data['attributes']['179']['options']
print(len(sizes))

The correct output should be '9', since there are 9 sizes. I don't want to use selenium or such packages. So, which is the correct 'start' and 'end'? Is there a better way to scrape this data than what I am trying to do?

akash karothiya · Accepted Answer · 2017-10-17 11:44:47Z

1

1 . Iterate all script tags and search target json

2 . Use regex to grab start and end

3 . Use json module

for i in soup.select('script'):
    if 'Product.Config' in str(i):
        data = re.search(r'(?is)(Product\.Config\()(.*?)(\))',str(i)).group(2)

json_data = json.loads(data)
print(len(json_data['attributes']['179']['options']))
9

answered Oct 17, 2017 at 11:44

akash karothiya

5,9601 gold badge21 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scrape content in json format - Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related