c++web-scrapingwininet

c++ fetching web page using wininet


I'm trying to download a web page using WinInet. I've used the code given here: http://www.cplusplus.com/forum/windows/109799/

It mostly works, but there seems to be some encoding issue that I have no idea how to fix.

For instance, this line (using www.stackoverflow.com as an example page):

<link rel="stylesheet" type="text/css" href="https://cdn.sstatic.net/Shared/stacks.css?v=48511da708b8">

Is returned as this line:

<link rel="stylesheet" type="text/css" href="https://cdn.sstatic.net/Shared/stacks.css?cks.css?ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌðA6÷v=48511da708b8">

(for the sake of not spamming, I've actually removed most of the special characters)


Solution

  • In this code:

    while(InternetReadFile(OpenAddress, DataReceived, 4096, &NumberOfBytesRead) && NumberOfBytesRead )
    {
        cout << DataReceived;
    }
    

    DataReceived is receiving arbitrary bytes. It is not a null-terminated string, but the code is passing it to the operator<< overload that expects a null-terminated string. So the printing is exceeding the end of the received data, printing bytes from surrounding memory, until a random 0x00 byte is encountered.

    Use the istream::write() method instead, so that you can tell it exactly how many characters to print:

    cout.write(DataReceived, NumberOfBytesRead);