unicodeutf-16rebolrebol3

What's the most efficient way to decode a UTF16 binary?


As Rebol 3 supports unicode, and UTF16 is used internally when needed (if it has only ASCII characters, it's in ASCII), it should be as simple as copying the memory content from the binary and setting up the REBVAL structure. However, the only way I find seems to be iterating over the binary and converting each character individually.

Same question applies to encoding a string in UTF16.


Solution

  • OK, there doesn't seem to be an easy way to do it. So I just added two codecs UTF-16LE/BE for this purpose. See this commit: https://github.com/zsx/r3/commit/630945070eaa4ae4310f53d9dbf34c30db712a21

    With this change, you can do:

    >> b: encode 'utf-16le "hello"
    == #{680065006C006C006F00}
    
    >> s: decode 'utf-16le b       
    == "hello"
    
    >> b: encode 'utf-16be "hello" 
    == #{00680065006C006C006F}
    
    >> s: decode 'utf-16be b 
    == "hello"