javascriptphantomjsdata-scrubbing

phantomjs always return page not found on a particular website


im trying to get sport matches resul from this website

http://www.oddsportal.com

but any address from this website that i try i end up with page not found

while it opens in my own browser

here is a sample code

var webPage = require('webpage');
var page = webPage.create();

page.onConsoleMessage = function(msg) {
  console.log(msg);
}

page.open('http://oddsportal.com', function(status) {

  page.evaluate(function() {
    console.log(document.getElementsByTagName('body')[0].innerHTML);
  });
  phantom.exit();

});

i dont know how they are blocking phantom js and i've no idea where to start .....

is there anything in phantom js headers that would alert them ?

i'll appreciate any suggestion or advice on how can i solve this

here is that website output

                                    <a href="http://www.oddsportal.com">
                                        <img src="logo.jpg" />
 </p>

                                    <div id="main" class="home">
                                        <div id="breadcrumb">
                                            <strong>The page you requested is not available.</strong>
                                        </div>
                                    <hr class="hidden">
                                        <div id="col-content">
                                            <h1>Page not found</h1>
                                            <p>This page not exist on OddsPortal.com!</p>
                        </div>
                                        <div class="break"></div>
                                        <hr class="hidden">
                                    </div>
                                    <div id="footer">
                                        <p class="l">Copyright © 2008-12 OddsPortal.com (v)</p>
                                        <div class="break"></div>
                        </div>

Solution

  • Try changing the user agent using page.settings.userAgent:

    var webPage = require('webpage');
    var page = webPage.create();
    
    page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';
    
    //...
    

    Source: PhantomJS Docs