unicodecodepagespolish

How can I detect the codepage of a serial of text,2 byte for a character,It's polish


How can I detect the codepage of a serial of text,2 byte for a charactor,It's polish.And for normal English charactor ,just add 0x00 to the ansi code, for special Polish character,the two byte have the special meaning. there is no file head ,just bytes stream like this.

Sample here

string: Połączenia

bytes: 50 00/6f 00/42 01/05 01/63 00/7a 00/65 00/69 00/61 00

I think it's not unicode ,because 0x4201 in unicode is a Chinese charactor not Polish.

So Any one can help me? thanks very much!


Solution

  • Its UTF-16 Big Endian.

    $ echo -n "Połączenia" | iconv -f UTF8 -t UTF16BE | hexdump
    0000000 5000 6f00 4201 0501 6300 7a00 6500 6e00
    0000010 6900 6100