c++windowsutf-8utfwindows-11

Reading UTF-8 input


I am making a program, like flashcards but console based. At the start of the program, I read from a file containing UTF-8 encoded Japanese characters (such as "ひらがな, カタカナ, 患者"). However, when I call std::getline(), the input comes out as "". How can I achieve this? Maybe opening STD_INPUT_HANDLE as a file? I use SetConsoleOutputCP() and SetConsoleCP() with CP_UTF8 as an argument to enable UTF-8 printing.

Issue in action

Minimal Reproducible Example, as requested by @πάντα ῥεῖ

#include <iostream>
#include <Windows.h>
#include <fstream>
#include <vector>
#include <string>

void populate(std::vector<std::string>& in) {
    std::ifstream file("words.txt"); // fill this with some UTF-8 characters, then check the contents of [in]

    std::string line;
    while (std::getline(file, line)) {
        in.emplace_back(line);
    }
}

int main() {
    SetConsoleOutputCP(CP_UTF8);
    SetConsoleCP(CP_UTF8);

    SetConsoleTitleA("Example");

    std::vector<std::string> arr;
    populate(arr);

    std::string input_utf8; // type some UTF-8 characters when asked for input
    std::cin >> input_utf8;

    for (std::string s : arr)
        if (input_utf8 == s)
            std::cout << "It works! The input wasn't null!";
}

Solution

  • This program works for me. I needed the code page 932 (Shift-JIS) to get things to show up right. (I do not have Japanese enabled on my Windows 10 machine,so it doesn't depend on that.) If I just std::cin or std::wcin, I can see in the debugger I am not getting the right input. But if I use ReadConsoleW/WriteConsoleW everything looks correct.

    #define _CRT_SECURE_NO_WARNINGS
    #include <windows.h>
    #include <iostream>
    
    using namespace std;
    
    int main()
    {
                                            //This code-page-changing stuff, plus the restoring later, is from
                                            //https://www.codeproject.com/articles/34068/unicode-output-to-the-windows-console
        UINT oldcp = GetConsoleOutputCP();  //what is the current code page? store for later
        SetConsoleOutputCP(932);            //set it up so it can do Japanese
    
        cout << "Enter something: "; 
    
        wchar_t wmsg[32];
        DWORD used;
        if (!ReadConsole(GetStdHandle(STD_INPUT_HANDLE),
            wmsg,
            31, //because wmsg has 32 slots. ?
            &used,
            nullptr))
            cerr << "ReadConsole failed, le = " << GetLastError() << endl;
    
        size_t len = used;
        cout << "You entered: ";
        //From https://cboard.cprogramming.com/windows-programming/112382-printing-unicode-console.html
        if (!WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), 
                wmsg, (DWORD) len,
                &used, 0))
                cerr << "WriteConsole failed, le = " << GetLastError() << endl;
        cout << '\n';
    
        cout << "Hit enter to end (and restore previous code page)."; cin.get();
        SetConsoleOutputCP(oldcp); SetConsoleCP(oldcp);
        return 0;
    }