1

i'm trying to catch some info about amazon stuff. Idk why my code doesn't work. Every time i try to test these lines, i get a None output. I'm using visual studio.

import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.amazon.it/Xiaomi-frequenza-Monitoraggio-Bracciale-Smartwatch/dp/B07T9DHKXL?pf_rd_r=F2MMPNCJR5AQ4KP5C82P&pf_rd_p=ff59f7ef-650d-5e5a-9ee5-6fd80bb0e21d&pd_rd_r=12e6add2-54cd-44b1-bfa4-81c70ad68010&pd_rd_w=Lo5MD&pd_rd_wg=t2rFz&ref_=pd_gw_ri"
)
soup = BeautifulSoup(page.content,'html.parser')
title = soup.find(id='productTitle')
price = soup.find(id='priceblock_ourprice')
print(title)
print(price)
1
  • did you try checking soup value? does it have these elements? Commented Jun 9, 2020 at 19:12

3 Answers 3

1

Andrej Kesely gave you the answer while I was typing, but to understand why this happens,

just add this print line after the soup = ... :

soup = BeautifulSoup(page.content,'html.parser')
print(soup.find_all("title"))
title = soup.find(id='productTitle')

This will print:

[<title dir="ltr">Amazon CAPTCHA</title>]

Amazon isn't "showing" the real page to your code, it is asking for a captcha.

Sign up to request clarification or add additional context in comments.

Comments

0

There are 2 problems:

1.) Use HTTP header User-Agent. Without it, Amazon sends you CAPTCHA page.

2.) As parser select html5lib or lxml. html.parser has problems parsing this page.

import requests
from bs4 import BeautifulSoup

url = 'https://www.amazon.it/Xiaomi-frequenza-Monitoraggio-Bracciale-Smartwatch/dp/B07T9DHKXL'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html5lib') # or 'lxml'

title = soup.find(id='productTitle')
price = soup.find(id='priceblock_ourprice')
print(title.get_text(strip=True))
print(price.get_text(strip=True))

Prints:

Xiaomi Mi Band 4 Activity Tracker,Monitor attività,Monitor frequenza cardiaca Monitoraggio Fitness, Bracciale Smartwatch con Schermo AMOLED a Colori 0,95, con iOS e Android (Versione Globale)
30,96 €

1 Comment

ty man, i had to install lmxl using sudo pip3 install lxml, then it worked
0

Most modern websites including Amazon load webpages dynamically using javascript. When you send a request using requests.get you get only the initial render of the webpage without the dynamically loaded content. You could use a library like selenium to load dynamically loaded webpages and then parse the page source to beautiful soup.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.