0

I'm trying to migrate some comments from a blog using web scraping with python and BeautifulSoup. The content I'm looking for isn't in the HTML itself and seems to have been generated in a script tag (which I can't find). I've seen some answers regarding this but most of them are specific to a certain problem and I can't seem to figure out how to apply it to my site. I'm just trying to scrape comments from pages like this one:

http://www.themasterpiececards.com/famous-paintings-reviewed/bid/92327/famous-paintings-duccio-s-maesta

I've also tried Selenium, but I'm using a Cloud9-based IDE currently and it doesn't seem to support web drivers.

I apologize if I botched any of the lingo, I'm pretty new to programming. If anyone has any tips, that would be helpful. Thanks!

2
  • Use the dryscrape library. Or phantomJS Commented Jan 23, 2018 at 1:58
  • Using Selenium is your best bet. I don't about your IDE but I recommend you to change your IDE to Pycharm or something where the drivers are supported. Commented Jan 23, 2018 at 4:47

1 Answer 1

1

You have many ways to scrap such content. One would be to find out how comments are loaded on this website. On quick lookup in chromium developer tools, comments for the page mentioned are loaded via this api call.

This may not be a suitable way for you as you may not generate this url for every different page.

Another more reliable way would be to render such js content using GUIless browser, for ease of implementation i would suggest using scrapy with splash .Splash is a python framework which renders most of the content for your requests.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.