Scraping simple javascript page

Question

I would like to scrape the data of this web site ( http://www.oddsportal.com/matches/soccer ) in order to get a plain text file with the match info and the odds info in this way:

00:30   Criciuma - Atletico-PR                    1:2   2.70    3.24    2.41    
10:45   Vier-und Marschlande - Concordia Hamburg  0:0   4.00    3.53    1.68    
10:45   Germania Schnelsen - ASV Bergedorf 85     2:3   1.95    3.37    3.23    
10:45   Barmbecker SG - Altona                    0:2   3.67    3.37    1.82

I used to do this with w3m, but now it seems that they changed html to javascript and w3m does not work. Data are contained in only one div. this is one entry

<tr xeid="862487"><td class="table-time datet t1333724400-1-1-0-0 ">17:00</td><td class="name table-participant" colspan="2"><a href="/soccer/italy/serie-b-2011-2012/brescia-marmi-lanza-verona-862487/">Brescia - Verona</a></td><td class="odds-nowrp" xoid="40456791" xodd="xzc0fxzxa">-</td><td class="odds-nowrp" xoid="40456793" xodd="cz0ofxz9c">-</td><td class="odds-nowrp" xoid="40456792" xodd="cz9xfcztx">-</td><td class="center info-value">17</td></tr>

What can I do?

Can you provide more information about how they are using Javascript? That will dictate potential solutions. — pjmorse
– pjmorse, Commented Apr 6, 2012 at 13:40
No idea. In Firefox I can see the table in recognizable HTML. So I guess 20 minutes work with BeautifulSoup ;-). — Fenikso
– Fenikso, Commented Apr 6, 2012 at 14:02
@Fenisko - just because you can see it in Firefox does not mean it is in the response. — pguardiario
– pguardiario, Commented Apr 6, 2012 at 18:53

pguardiario · Accepted Answer · 2012-04-06 19:11:56Z

3

The easiest way (maybe not the best though) is to use selenium/watir. In ruby I would do:

require 'watir-webdriver'
require 'csv'
@browser = Watir::Browser.new
@browser.goto 'http://www.oddsportal.com/matches/soccer/'
CSV.open('out.csv', 'w') do |out|
    @browser.trs(:class => /deactivate/).each do |tr|
        out << tr.tds.map(&:text)
    end
end

answered Apr 6, 2012 at 19:11

pguardiario

55.2k21 gold badges130 silver badges169 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pguardiario Over a year ago

Yes, there's also jruby and htmlunit. I think you'll find that /odd/ will only give odd numbered rows.

Community · Accepted Answer · 2017-05-23 11:56:27Z

2

If they are using Javascript to get data from a service and render it within the DIV, W3M will not show the div updated with that data, because it does not support Javascript.

You have two choices:

Reverse-engineer their Javascript to find out where the data is coming from, and see if you can query that data source directly to get the XML or JSON they're using to update the DIV. Then you can skip the scraping entirely. They might not want you doing that, however, and may have secured the data source to prevent it. Or they might not have.
Use a browser which executes Javascript before you start your scraping. This way you'll have the div populated with the data. W3M-js might do this for you, or you might want to try something else (lynx or links). This question seems to be related.

ETA: Maybe PhantomJS would help here?

edited May 23, 2017 at 11:56

CommunityBot

11 silver badge

answered Apr 6, 2012 at 14:13

pjmorse

9,3349 gold badges57 silver badges126 bronze badges

6 Comments

emanuele Over a year ago

i don't know how to get data from their service. what do you means with "use a browser which executes Javascript before you start your scraping"? i need to do this in automatic way to collect data at different times.

pjmorse Over a year ago

If you look at the source JS which is building the content in their div, it might indicate where it's getting the data. You could get the same data (in XML or JSON) and skip the scraping if they haven't secured it. As far as the browser goes: because they're using JS to render the data, they're counting on their viewers having JS enabled. W3M does not support JS, so it's not rendering the data. I'll update my answer accordingly.

emanuele Over a year ago

w3m-js seems that had disappeared from web :(

pguardiario Over a year ago

I agree with what you say except for the part about securing the data. If you can see the data in a browser, then you can scrape it.

pjmorse Over a year ago

Maybe. I can imagine the service being set up to require certain criteria (e.g. a cookie or similar session token) in the request; such criteria could certainly be imitated or spoofed somehow, but it would make regularly sipping data from the service somewhat less simple.

|

Collectives™ on Stack Overflow

Scraping simple javascript page

2 Answers 2

1 Comment

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related