javascriptweb-scrapingw3m

Scraping simple javascript page


I would like to scrape the data of this web site ( http://www.oddsportal.com/matches/soccer ) in order to get a plain text file with the match info and the odds info in this way:

00:30   Criciuma - Atletico-PR                    1:2   2.70    3.24    2.41    
10:45   Vier-und Marschlande - Concordia Hamburg  0:0   4.00    3.53    1.68    
10:45   Germania Schnelsen - ASV Bergedorf 85     2:3   1.95    3.37    3.23    
10:45   Barmbecker SG - Altona                    0:2   3.67    3.37    1.82

I used to do this with w3m, but now it seems that they changed html to javascript and w3m does not work. Data are contained in only one div. this is one entry

<tr xeid="862487"><td class="table-time datet t1333724400-1-1-0-0 ">17:00</td><td class="name table-participant" colspan="2"><a href="/soccer/italy/serie-b-2011-2012/brescia-marmi-lanza-verona-862487/">Brescia - Verona</a></td><td class="odds-nowrp" xoid="40456791" xodd="xzc0fxzxa">-</td><td class="odds-nowrp" xoid="40456793" xodd="cz0ofxz9c">-</td><td class="odds-nowrp" xoid="40456792" xodd="cz9xfcztx">-</td><td class="center info-value">17</td></tr>

What can I do?


Solution

  • The easiest way (maybe not the best though) is to use selenium/watir. In ruby I would do:

    require 'watir-webdriver'
    require 'csv'
    @browser = Watir::Browser.new
    @browser.goto 'http://www.oddsportal.com/matches/soccer/'
    CSV.open('out.csv', 'w') do |out|
        @browser.trs(:class => /deactivate/).each do |tr|
            out << tr.tds.map(&:text)
        end
    end