automationphantomjscasperjsslimerjs

Cant open https web using Slimerjs, casperjs, phantomjs


This is first time i cant open website using headless browser such: phantomjs, slimerjs or casperjs. I just want to open website. I just create very basic script to open the website and take screenshot. but 3 (three) of them give me blank picture.

i try using:

--debug=true 
--ssl-protocol=TLSv1.2 (i try each of available protocol) 
--ignore-ssl-errors=true

Here my script:

Slimerjs

var page = require("webpage").create();
page.open("https://domain/")
    .then(function(status){
         if (status == "success") {
            page.viewportSize = { width:1024, height:768 };
            page.render('screenshot.png');
         }
         else {
             console.log("Sorry, the page is not loaded");
         }
         page.close();
         phantom.exit();
    });

phantomjs

var page = require('webpage').create();
page.open('https://domain/', function() {
  page.render('screenshot.png');
  phantom.exit();
});

casperjs

var casper = require('casper').create({
  viewportSize: {width: 950, height: 950}
});

casper.start('https://domain/', function() {
    this.capture('screenshot.png');
});

casper.run();

I even try to use screen capture service to know if they can open or not. But all of them give me nothing too.

is there i miss something?


Solution

  • The issue is not because of PhantomJS as such. The site you are checking is protected by a F5 network protection

    https://devcentral.f5.com/articles/these-are-not-the-scrapes-youre-looking-for-session-anomalies

    So its not that the page doesn't load. It is that the protection mechanism detects that PhantomJS is a bot based on checks they have implemented

    Page Loaded

    The easiest of fixes is to use Chrome instead of PhantomJS. Else it means a decent amount of investigation time

    Some similar unanswered/answered question in the past

    Selenium and PhantomJS : webpage thinks Javascript is disabled

    PhantomJS get no real content running on AWS EC2 CentOS 6

    file_get_contents while bypassing javascript detection

    Python POST Request Not Returning HTML, Requesting JavaScript Be Enabled

    I will update this post with more details that I find. But my experience says, go with what works instead of wasting time on such sites which don't work under PhantomJS

    Update-1

    I have tried to import the browser cookies to PhantomJS and it still won't work. Which means there is some hard checks

    Cookies