I use stringstream and libcurl to download data. I have a function for parsing too.
bool parse()
{
istringstream temp(buff.str());
buff.str("");
string line;
QString line_QStr, lyrics_QStr;
while (temp.good())
{
getline(temp, line);
if (QString::fromStdString(line).contains(startMarker)) break;
}
if (!temp.good()) return false; // something went wrong
while (temp.good())
{
getline(temp, line);
if ((line_QStr = QString::fromStdString(line)).contains(endMarker))
{
lyrics_QStr += line_QStr.remove(endMarker); // remove the </div>
break;
}
else
{
lyrics_QStr += line_QStr;
}
}
if (!temp.good()) return false;
QTextDocument lyricsHtml;
lyricsHtml.setHtml(lyrics_QStr);
lyrics_qstr = lyricsHtml.toPlainText();
return true;
}
When the text is ascii-only is ok. But if it's unicode, then I'm losing the unicode chars somewhere in this function. And it comes out something like this:
I use string and getline instead of QTextStream and QString, as I couldn't find any counterpart of good() function so I couldn't make any decent error handling.
What am I doing wrong in this function that the unicode chars are lost and are displayed as 2 other chars? How can I fix it? Thanks in advance!
EDIT: I changed the parse function to this:
bool LyricsManiaDownloader::parse()
{
wistringstream temp(string2wstring(buff.str()));
buff.str("");
wstring line;
QString line_QStr, lyrics_QStr;
while (temp.good())
{
getline(temp, line);
if (QString::fromStdWString(line).contains(startMarker)) break;
}
if (!temp.good()) return false; // something went wrong
while (temp.good())
{
getline(temp, line);
if ((line_QStr = QString::fromStdWString(line)).contains(endMarker))
{
lyrics_QStr += line_QStr.remove(endMarker); // remove the </div>
break;
}
else
{
lyrics_QStr += line_QStr;
}
}
if (!temp.good()) return false;
QTextDocument lyricsHtml;
lyricsHtml.setHtml(lyrics_QStr);
lyrics_qstr = lyricsHtml.toPlainText();
return true;
}
And the string2wstring function is
wstring string2wstring(const string &str)
{
wstring wstr(str.length(), L' ');
copy(str.begin(), str.end(), wstr.begin());
return wstr;
}
And there's still some problem with encoding.
EDIT2: I use this function for saving data into a stringstream
size_t write_data_to_var(char *ptr, size_t size, size_t nmemb, void *userdata)
{
ostringstream * stream = (ostringstream*) userdata;
size_t count = size * nmemb;
stream->write(ptr, count);
return count;
}
I pass the std::ostringstream buff to curl, and the web page data is saved here. Then I use a wistringstream, convert buff.str() to wstring and use it as a source for wistringstream. The conversion from std::string to std::wstring is the decoding, isn't it?
The Web server returns a stream of bytes alongside a header that indicates what encoding those bytes should be understood as.