c++utf-8fastcgi++

fastcgipp < no output for utf8 characters


Edit

I solved the issue here under by typing out << L"Swedish: å ä ö Å Ä Ö", that is a prefixed L before the string, explained in this answer: What exactly is the L prefix in C++? My question is now if this is a good solution or if there is a preferred alternative to solving this?


The code

The following edited method from http://www.nongnu.org/fastcgipp/doc/2.1/a00004.html:

    bool response()
    {
       wchar_t russian[]={ 0x041f, 0x0440, 0x0438, 0x0432, 0x0435, 0x0442, 0x0020, 0x043c, 0x0438, 0x0440, 0x0000 };
       wchar_t chinese[]={ 0x4e16, 0x754c, 0x60a8, 0x597d, 0x0000 };
       wchar_t greek[]={ 0x0393, 0x03b5, 0x03b9, 0x03b1, 0x0020, 0x03c3, 0x03b1, 0x03c2, 0x0020, 0x03ba, 0x03cc, 0x03c3, 0x03bc, 0x03bf, 0x0000 };
       wchar_t japanese[]={ 0x4eca, 0x65e5, 0x306f, 0x4e16, 0x754c, 0x0000 };
       wchar_t runic[]={ 0x16ba, 0x16d6, 0x16da, 0x16df, 0x0020, 0x16b9, 0x16df, 0x16c9, 0x16da, 0x16de, 0x0000 };
       out << "Content-Type: text/html; charset=utf-8\r\n\r\n";
       out << "<html><head><meta http-equiv='Content-Type' content='text/html; charset=utf-8' />";
       out << "<title>fastcgi++: Hello World in UTF-8</title></head><body>";
       out << "English: Hello World<br />";
       out << "Russian: " << russian << "<br />";
       out << "Greek: " << greek << "<br />";
       out << "Chinese: " << chinese << "<br />";
       out << "Japanese: " << japanese << "<br />";
       out << "Runic English?: " << runic << "<br />";
       out << "Swedish: å ä ö Å Ä Ö<br />";
       out << "</body></html>";
       return true;
    }

Raw output

Content-Type: text/html; charset=utf-8

<html><head><meta http-equiv='Content-Type' content='text/html; charset=utf-8' /><title>fastcgi++: Hello World in UTF-8</title></head><body>English: Hello World<br />Russian: Привет мир<br />Greek: Γεια σας κόσμο<br />Chinese: 世界您好<br />Japanese: 今日は世界<br />Runic English?: ᚺᛖᛚᛟ ᚹᛟᛉᛚᛞ<br />Swedish:      <br /></body></html>

Browser interperation

English: Hello World
Russian: Привет мир
Greek: Γεια σας κόσμο
Chinese: 世界您好
Japanese: 今日は世界
Runic English?: ᚺᛖᛚᛟ ᚹᛟᛉᛚᛞ
Swedish: 

As seen above, the last swedish line has an expected behavier of outputting "å ä ö Å Ä Ö". This is however replaced with whitespaces for some reason. There has to be a way where I don't acctully have too type out the unicode hexidecimal representation of that letter.

After some google reseach I tried adding setLocale in the beginning of the main script with no success.

Why is this accuring?
How can I solve the issue to be able to use any utf8 character freely while coding in the manner decribed above?


Solution

  • This works on Linux:

    #include <iostream>
    #include <locale>
    
        bool response()
        {
           wchar_t russian[]={ 0x041f, 0x0440, 0x0438, 0x0432, 0x0435, 0x0442, 0x0020, 0x043c, 0x0438, 0x0440, 0x0000 };
           wchar_t chinese[]={ 0x4e16, 0x754c, 0x60a8, 0x597d, 0x0000 };
           wchar_t greek[]={ 0x0393, 0x03b5, 0x03b9, 0x03b1, 0x0020, 0x03c3, 0x03b1, 0x03c2, 0x0020, 0x03ba, 0x03cc, 0x03c3, 0x03bc, 0x03bf, 0x0000 };
           wchar_t japanese[]={ 0x4eca, 0x65e5, 0x306f, 0x4e16, 0x754c, 0x0000 };
           wchar_t runic[]={ 0x16ba, 0x16d6, 0x16da, 0x16df, 0x0020, 0x16b9, 0x16df, 0x16c9, 0x16da, 0x16de, 0x0000 };
           std::wcout << "Content-Type: text/html; charset=utf-8\r\n\r\n" << std::endl;
           std::wcout << "<html><head><meta http-equiv='Content-Type' content='text/html; charset=utf-8' />" << std::endl;
           std::wcout << "<title>fastcgi++: Hello World in UTF-8</title></head><body>" << std::endl;
           std::wcout << "English: Hello World<br />" << std::endl;
           std::wcout << "Russian: " << russian << "<br />" << std::endl;
           std::wcout << "Greek: " << greek << "<br />" << std::endl;
           std::wcout << "Chinese: " << chinese << "<br />" << std::endl;
           std::wcout << "Japanese: " << japanese << "<br />" << std::endl;
           std::wcout << "Runic English?: " << runic << "<br />" << std::endl;
           std::wcout << L"Swedish: å ä ö Å Ä Ö<br />" << std::endl;
           std::wcout << "</body></html>" << std::endl;
           return true;
        }
    
    int main()
    {
      std::locale::global(std::locale(""));
      response();
    }
    

    Note (1) the output is to a wide stream and (2) the Swedish string literal is wide (L"whatever"). The L prefix ("Long") before a string literal means the literal is a wide-string literal (wchar_t[]) as opposed to regular string literal (char[]).

    Narrow string literals don't work here because the narrow charset is by default UTF-8 and by default there is no conversion from UTF-8 to whatever wide encoding there is (UCS4 probably). Each byte is just widened, which is totally wrong. If you want you can convert it yourself or use one of the standard conversion functions: mbstowcs (not really portable) or C++11 wstring_convert (not really working with gcc/libstdc++, working with clang/libc++).

    How to make this work on Windows is anyone's guess.

    It is recommended to stick to either char and UTF-8, or wchar_t and UCS4 (on Linux). Since you want to output UTF-8, it is reasonable to use char, not wchar_t.