javahtmlunithtmlunit-driver

HtmlUnit and HTTPS pages


I'm trying to make a program that checks avaliable positions and books the first avaliable one. I started writing it and i ran into a problem pretty early. The problem is that when I try to connect with the site (which is https) the program doesn't do anything. It doesn't throw an error, it doesn't crash. And the weirdest thing is that it works with some https websites and with some it doesn't. I've spent countless hours trying to resolve this problem. I tried using htmlunitdriver and it still doesn't work. Please help.

private final WebClient webc = new WebClient(BrowserVersion.CHROME);
webc.getCookieManager().setCookiesEnabled(true);
HtmlPage loginpage = webc.getPage(loginurl);        
System.out.println(loginpage.getTitleText());

I'm getting really frustrated with this. Thank you in advance.


Solution

  • As far as i can see this has nothing to do with HttpS. It is a good idea to do some traffic analysis using Charles or Fiddler. What you can see....

    The page returned from the server as response to your first call to https://online.enel.pl/ loads some external javascript. And then the story begins:

    This JS looks like

    (function() {
        var z = "";
        var b = "766172205f3078666.....";
        eval((function() {
            for (var i = 0; i < b.length; i += 2) {
                z += String.fromCharCode(parseInt(b.substring(i, i + 2), 16));
            }
            return z;
        })());
    })();
    

    As you can see someone likes to hide the real javascript that gets processed.

    Next step is to check the javascript after this simple decoding

    It is really huge and looks like this

    var _0xfbfd = ['\x77\x71\x30\x6b\x77 ....
    (function (_0x2ea96d, _0x460da4) {
        var _0x1da805 = function (_0x55e996) {
            while (--_0x55e996) {
                _0x2ea96d['\x70\x75\x73\x68'](_0x2ea96d['\x73\x68\x69\x66\x74']());
            }
        };
    .....
    

    Ok now we have obfuscated javascript. If you like you can start with http://ddecode.com/hexdecoder/ to get some more readable text but this was the step where i have stopped my analysis. Looks like this script does some really bad things or someone still believes in security by obscurity.

    If you run this with HtmlUnit, this codes gets interpreted - yes the decoding works and the code runs. Sadly this code runs endless (maybe because of an error or some incompatibility with real browsers).

    If you like to get this working, you have to figure out, where the error is and open an bug report for HtmlUnit. For this you can simply start with a small local HtmlFile and include the code from the first external javascript. Then add some log statements to get the decoded version. Then replace this with the decoded version and try to understand what is going on. You can start adding alert statements and check if the code in HtmlUnit follows the same path as browsers do. Sorry but my time is to limited to do all this work but i really like to help/fix if you can point to a specific function in HtmlUnit that works different from real browsers.