0

I am trying to scrape the pricing information from these two websites: site1 and site2 I am using Python and packages BeautifulSoup and requests.

What I realized is that the pricing section is not available in the source code for both sites. So I am wondering how I can scrape the data.

Any advice would be appreciated. Thank you

6
  • If you highlight the section and view element, you see the information. But if you just check out the source code, then you do not see the pricing section Commented Jun 27, 2014 at 23:00
  • Why does the post marked with such a strange "Too broad" flag? The question is very specific about a specific problem on a specific web-site. Commented Jun 27, 2014 at 23:40
  • would you be able to comment on the second link? can I approach it in a similar fashion? Commented Jun 27, 2014 at 23:43
  • furthermore, do you know if I could get the same information searching by CAS #? Ideally, I would like to submit a list of cas #'s to python and be able to return the prices. Commented Jun 27, 2014 at 23:47
  • I apologize if I am asking too much. But I am new to Python and have spent the last month working on this code to no avail. Thank you again for your kind help. Commented Jun 27, 2014 at 23:48

1 Answer 1

2

The problem is that first you need to select a country to see the prices.

In technical sense, you need to make a POST request to http://www.strem.com/catalog/index.php to select a country, then you can get the prices:

from bs4 import BeautifulSoup
import requests

URL = "http://www.strem.com/catalog/v/29-6720/17/copper_1300746-79-5"
session = requests.session()
p = session.post("http://www.strem.com/catalog/index.php", {'country': 'USA',
                                                            'page_function': 'select_country',
                                                            'item_id': '7211',
                                                            'group_id': '17'})

response = session.get(URL)
soup = BeautifulSoup(response.content)
print [td.text.strip() for td in soup.find_all('td', class_='price')]

This prints:

[u'US$85.00', u'US$285.00', u'US$1,282.00', u'US$3,333.00']

A more elegant solution would be to submit a form using mechanize package:

import cookielib
from bs4 import BeautifulSoup
import mechanize

URL = "http://www.strem.com/catalog/v/29-6720/17/copper_1300746-79-5"
browser = mechanize.Browser()
cj = cookielib.LWPCookieJar()
browser.set_cookiejar(cj)
browser.open(URL)
browser.select_form(nr=1)
browser.form['country'] = ['USA']
browser.submit()

data = browser.response().read()
soup = BeautifulSoup(data)
print [td.text.strip() for td in soup.find_all('td', class_='price')]

Prints:

[u'US$85.00', u'US$285.00', u'US$1,282.00', u'US$3,333.00']
Sign up to request clarification or add additional context in comments.

1 Comment

wow, it has been some time since I have seen such an elaborate and awesome answer. :+1:

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.