clinuxwidechar

print unicode character on linux using gcc


I'm trying to print wchar_t string to terminal but the string doesn't show up or it appears as unreadable characters.

I tried on XUbuntu 22.04 and gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 and you can see the sample code here,

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
        setlocale(LC_ALL, "en_US.UTF-8");
    wchar_t sample1[] = { L"Sample TEXT\\自己人自己人人       AZZZZZZZA己中国中中中\n" };
    printf("AAAA\n");
    printf("%ls", L"ABCD");
    printf("%ls", sample1);
    return 0;
}

and I compile it using gcc as follow,

gcc test.c -fshort-wchar -o test

I write the data to a file on Windows as unicode and I should read the file and print it's content on Linux. So wchar_t on Windows is 16bit but on Linux its 32bit that's why I used -fshort-wchar gcc flag.

In the output of the above code I can only see "AAAA\n" thats it.

What is the issue with my code? How can I print unicode wchar_t in C properly and be able to read it in my terminal?

I will rephrase my question as suggested in the first comment, I have a file saved as utf-16 on Windows, how do I print it on Linux?

Thanks


Solution

  • What is the issue with my code?

    The issue with your code is that you used -fshort-wchar and glibc was compiled to work with 32-bit wchar_t. In turn, printf("%ls" accesses the memory as a 32-bit array, while the array has 16-bit elements.

    How can I print unicode wchar_t in C properly and be able to read it in my terminal?

    Do not use -fshort-wchar or compile anything that you use like C standard library and other libraries that you indent to use with -fshort-wchar.

    the data to a file on Windows as unicode and I should read the file and print it's content on Linux

    Then you have to know the "unicode" format that windows has written the file in. Once it is known, it is typical use iconv command or function to convert the file. You can also use libraries like libunistring or icu to handle unicode.