0

I need to scrape a webpage and I normally use scrapy. I need to follow some link that can be opened through javascript and they are nested into some < ul > and < li >.

For example:

<ul class="level1">
   <li class="closed"> <----this become "expanded" when opened
     <a href="javascript:etc...
       <ul class="level2">
         <li class="closed">
           <ul class="level3">
            <li class="track">
              <a href="this_is_the_url_that_I_want">

Now, did I need something else than scrapy (I see that Selenium is suggested) or can I use a XmlLinkExtractor? Or can I, in some ways, use the code to extract the url inside "level3"?

Thanks

EDIT: I'm trying to use selenium but I get " File "/usr/lib/pymodules/python2.7/scrapy/spiderloader.py", line 40, in load raise KeyError("Spider not found: {}".format(spider_name)) KeyError: 'Spider not found: '"

I'm naming the spider, so I don't understand what I've done wrong.

import scrapy
from selenium import webdriver

class audioSpider(scrapy.Spider):
    name = "audio"
    allowed_domains = ["http://audio.sample"]
    start_urls = ["http://audio.sample/archive-project"]

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)
        el1 = self.driver.find_element_by_xpath('//ul[@class="level1"]/li[@class]/href')
        el1.click()
        el2 = self.driver.find_element_by_xpath('//id[@class="subNavContainer loaded"/ul[@class="level2"]/li[@class]/href')
        el2.click()
        el3 = self.driver.find_element_by_xpath('//id[@class="subNavContainer loaded"/ul[@class="level3"]/li[@class="track"]/href')
        print el3
10
  • 1
    stackoverflow.com/questions/13436418/… I think this has been answered here. Commented Jun 22, 2016 at 15:10
  • Ok, I'll try with that Commented Jun 22, 2016 at 15:18
  • No problem. Consider marking the question a duplicate if this was your answer ! Good luck. Commented Jun 22, 2016 at 15:21
  • I've just updated the post Commented Jun 23, 2016 at 12:17
  • 1
    you call scrapy crawl audio right ? Commented Jun 23, 2016 at 12:42

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.