c++utf-8cpp-netlib

How to read UTF-8 enconding with cpp-netlib when not specified in html head


I'm trying to get the content of some websites using cpp-netlib (plus boost, on linux). Both netlib and boost are latest versions (installed this week, no compilation problems).

The point is: from some sites, I get the correct UTF-8 encoding (characters like ç, á, î, etc. show up correctly). From other sites, these characters come as "?" inside black diamonds. I have noticed that the formers have an explicit html tag inside the header about the UTF-8 encoding, while the other don't.

I have tried a few things with the "header request" in my code, after going a little bit through the docs and google, but as much as I don't know what I was doing, I had no success.

I'm using a very simple code as given in the standard examples. As follows:

includes, namespaces...

  network::http::client client;
  network::http::client::request request(url);  
  //boost::network::add_header(request, "Content-Type", "application/x-www-form-urlencoded; charset=utf-8");
  request << network::header("Connection", "close");
  //request << boost::network::header("Content-Type", "application/x-www-form-urlencoded; charset=utf-8");
  //request << boost::network::header("Accept", "application/x-www-form-urlencoded; charset=utf-8");
  network::http::client::response response = client.get(request);
  content = body(response);
  cout << content;

The commented out parts are those that i've tried to "change the header" in order to make the content treated as "UTF-8" by the request (so I thought).

Sorry for the newbieness, but any help or comment will be much aprecciated.

Thanks.


Solution

  • Well, at the end, I guess my question simply makes no sense. The fact is that I was trying to read some websites content, and I needed to put this content to a txt file. Before writing to a file, I was writing to the terminal, where I was seeing the weird characters... Someone from the netlib google group told me I should just write to a file and there would be no problem. And there was not. All the non-ascii characters wrote normally to the file.

    In the end, the problem was not about reading utf enconded (it was beeing read), but about seeing it as such in the terminal. Although not a real problem, since what I really needed was utf-enconding in a file. And it works as such simply out of the box.

    I hope it at least helps someone who get stuck as I did.