0

I am trying to get image links of the products from the website. I can get image info on some of the products. However, I can't get some of them. In the code URL1 is working but URL2 throws "json.decoder.JSONDecodeError". I think the problem is I cant parse the JSON string. I am not good at regular expression. How can I get JSON string?

Screenshot

Code

import re,json,requests
url1 =  "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"
r = requests.get(url2)
data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?);', r.text).group(1))
images = ['https://www.trendyol.com' + img for img in data['product']['images']]
print(images)

2 Answers 2

1

The following regex is a better match for your given urls as it terminates at the end of the nested dictionaries and before the start of the next block.

import re,json,requests

url1 =  "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"

for url in [url1, url2]:
    r = requests.get(url)
    data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?\}\});', r.text).group(1))
    images = ['https://www.trendyol.com' + img for img in data['product']['images']]
    print(images)
    print("")

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

You can try this:

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0',
}

r = requests.get('https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135')
soup = BeautifulSoup ((r.text).encode('utf-8'))

img = soup.findAll ('img')
for x in img:
    print(x['src'])

1 Comment

Thank you. I tried that way before. It is getting low-resolution images. I need high-resolution images of the product.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.