c++httpboostboost-asiochunked-encoding

Tried to parse chunked transfer encoding,it's not working though, the file which I decoded is totally unreadable


I tried to parse the data which was generated by chunked transfer encoding in a Rest API ,I did see the data has value when I tried to print the value in a string and I thought it should be working,but when I tried to assign the value to the file, the file is totally unreadable, the code below I used boost library and I gonna elaborate my thoughts in the code , we gonna get started from the response portion of my code, I have no idea what wrong I have done

   // Send the request.
    boost::asio::write(socket, request);

    // Read the response status line. The response streambuf will automatically
    // grow to accommodate the entire line. The growth may be limited by passing
    // a maximum size to the streambuf constructor.
    boost::asio::streambuf response;
    boost::asio::read_until(socket, response, "\r\n");

    // Check that response is OK.
    std::istream response_stream(&response);
    std::string http_version;
    response_stream >> http_version;
    unsigned int status_code;
    response_stream >> status_code;
    std::string status_message;
    std::getline(response_stream, status_message);
    if (!response_stream || http_version.substr(0, 5) != "HTTP/")
    {
        //std::cout << "Invalid response\n";
        return 9002;
         
    }
    if (status_code != 200)
    {
        //std::cout << "Response returned with status code " << status_code << "\n";
        return 9003;
    }
    
    // Read the response headers, which are terminated by a blank line.
    boost::asio::read_until(socket, response, "\r\n\r\n");

    // Process the response headers.
    //this portion of code I tried to parse the file name in the header of response which the file name is in the  content-disposition of header
    std::string header;
    std::string fullHeader = "";
    string zipfilename="", txtfilename="";
    bool foundfilename = false;
    while (std::getline(response_stream, header) && header != "\r")
    {
        fullHeader.append(header).append("\n");
        std::transform(header.begin(), header.end(), header.begin(),
            [](unsigned char c){ return std::tolower(c); });
        string containstr = "content-disposition";
        string containstr2 = "filename";
        string quotestr = "\"";
        if (header.find(containstr) != std::string::npos && header.find(containstr2) != std::string::npos)
        {
            int countquotes = 0;
            bool foundquote = true;
            
            std::size_t startpos = 0, beginpos, endpos;
            while (foundquote)
            {
                
                std::size_t myfound = header.find(quotestr, startpos);
                if (myfound != std::string::npos)
                {
                    if (countquotes % 2 == 0)
                        beginpos = myfound;
                    else
                    {
                        endpos = myfound;
                        foundfilename = true;
                    }

                    startpos = myfound + 1;
                    
                }
                else
                   foundquote = false;

                countquotes++;
            }

            if (endpos > beginpos && foundfilename)
            {
                size_t zipfileleng = endpos - beginpos;
                zipfilename = header.substr(beginpos+1, zipfileleng-1);
                txtfilename = header.substr(beginpos+1, zipfileleng-5);
            }
            else
                return 9004;

        }
    }

    if (foundfilename == false || zipfilename.length() == 0 || txtfilename.length() == 0)
        return 9005;

     //when the zipfilename has been found, we gonna get the data from the body of response, due to the response was  chunked transfer encoding, I tried to parse it,it's not complicated due to I saw it on the Wikipedia, it just first line was length of data,the next line was data,and it's the loop which over and over again ,all I tried to do was spliting all the data from the body of response by "\r\n" into a vector<string>, and I gonna read the data from that vector

      // Write whatever content we already have to output.
    std::string fullResponse = "";
    if (response.size() > 0)
    {
        std::stringstream ss;
        ss << &response;
        fullResponse = ss.str();
     
    
    }
    //tried split the entire body of response into a vector<string>

     vector<string> allresponsedata;
    split_regex(allresponsedata, fullResponse, boost::regex("(\r\n)+"));
    
    //tried to merge the data of response
    string zipfiledata;
    int myindex = 0;
    for (auto &x : allresponsedata) {
        std::cout << "Split: " << x << std::endl;// I tried to print the data, I did see the value in the variable of x

        if (myindex % 2 != 0)
        {
            zipfiledata = zipfiledata + x;//tried to accumulate the datas
        }


        myindex++;
    }
    
    //tried to write the data into a file
    std::ofstream zipfilestream(zipfilename, ios::out | ios::binary);
    zipfilestream.write(zipfiledata.c_str(), zipfiledata.length());
    zipfilestream.close();

    //afterward, the zipfile was built, but it's unreadable which it's not able to open,the zip utlities software says it's a damaged zip file though

I even tried something else ways like this slow http client based on boost::asio - (Chunked Transfer) ,but this way is not working as well ,VS says

  1 IntelliSense: no instance of overloaded function "boost::asio::read" matches the argument list
        argument types are: (boost::asio::ip::tcp::socket, boost::asio::streambuf, boost::asio::detail::transfer_exactly_t, std::error_code)    

it just NOT able to compile in the line which is

size_t n = asio::read(socket, response, asio::transfer_exactly(chunk_bytes_to_read), error);

even I have read the example of asio::transfer_exactly, there's no exactly example like this though https://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/transfer_exactly.html

any idea?


Solution

  • I don't see you read the format correctly: https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Format

    You need to read the chunk length (in hex) and any optional chunk extensions before accumulating the full response body.

    It needs to be done before, because the sequence \r\n that you split on can easily appear inside the chunk data.

    Again, I recommend to just use Beast's support, making it all a simple

     http::response<http::string_body> response;
     boost::asio::streambuf buf;
     http::read(socket, buf, response);
    

    And you will have the headers fully parsed, interpreted (including Trailer headers!) and the content in response.body() as a std::string.

    It will do the right thing even if the server doesn't use chunked encoding or combines with different encoding options.

    There's simply no reason to reinvent the wheel.

    Full Demo

    This demonstrates with the Chunked Encoding test url from https://jigsaw.w3.org/HTTP/:

    #include <boost/process.hpp>
    #include <boost/beast.hpp>
    #include <iostream>
    namespace http = boost::beast::http;
    using boost::asio::ip::tcp;
    
    int main() {
        http::response<http::string_body> response;
    
        boost::asio::io_context ctx;
        tcp::socket socket(ctx);
    
        connect(socket, tcp::resolver{ctx}.resolve("jigsaw.w3.org", "http"));
    
        http::write(
                socket,
                http::request<http::empty_body>(
                    http::verb::get, "/HTTP/ChunkedScript", 11));
    
        boost::asio::streambuf buf;
        http::read(socket, buf, response);
    
        std::cout << response.body() << "\n";
        std::cout << "Effective headers are:" << response.base() << "\n";
    }
    

    Printing

    This output will be chunked encoded by the server, if your client is HTTP/1.1
    Below this line, is 1000 repeated lines of 0-9.
    -------------------------------------------------------------------------
    01234567890123456789012345678901234567890123456789012345678901234567890
    01234567890123456789012345678901234567890123456789012345678901234567890
    ...996 lines removed ...
    01234567890123456789012345678901234567890123456789012345678901234567890
    01234567890123456789012345678901234567890123456789012345678901234567890
    
    Effective headers are:HTTP/1.1 200 OK
    cache-control: max-age=0
    date: Wed, 31 Mar 2021 20:09:50 GMT
    transfer-encoding: chunked
    content-type: text/plain
    etag: "1j3k6u8:tikt981g"
    expires: Wed, 31 Mar 2021 20:09:49 GMT
    last-modified: Mon, 18 Mar 2002 14:28:02 GMT
    server: Jigsaw/2.3.0-beta3