go
var int32s = []int32{
8, 253, 80, 56, 30, 220, 217, 42, 235, 33, 211, 23, 231, 216, 234, 26,
}
fmt.Println("word: ", string(int32s))
js
let int32s = [8, 253, 80, 56, 30, 220, 217, 42, 235, 33, 211, 23, 231, 216, 234, 26]
str = String.fromCharCode.apply(null, int32s);
console.log("word: " + String.fromCharCode.apply(null, int32s))
2 results above are not the same for some empty characters.
Is there any solution for modify go code to generate same result to the js one?
To cite the docs on String.fromCharCode
:
The static
String.fromCharCode()
method returns a string created from the specified sequence of UTF-16 code units.
So each number in your int32s
array is interpreted as a 16-bit integer providing a Unicode code unit, so that the whole sequence is interpreted as a series of code units forming an UTF-16-encoded string.
I'd stress the last point because judging from the naming of the variable—int32s
,—whoever is the author of the JS code, they appear to have incorrect idea about what is happening there.
Now back to the Go counterpart. Go does not have built-in support for UTF-16 encodings; its strings are normally encoded using UTF-8 (though they are not required to, but let's not digress), and also Go provides the rune
data type which is an alias to int32
.
A rune is a Unicode code point, that is, a number which is able to contain a complete Unicode character.
(I'll get back to this fact and its relation to the JS code in a moment.)
Now, what's wrong with your string(int32s)
is that it interpets your slice of int32
s in the same way as []rune
(remember that a rune
is an alias to int32
), so it takes each number in the slice to represent a single Unicode character and produces a string of them.
(This string is internally encoded as UTF-8 but this fact is not really relevant to the problem.)
In other words, the difference is this:
The Go standard library produces a package to deal with UTF-16 encoding: encoding/utf16
, and we can use it to do what the JS code codes—to decode an UTF-16-encoded string into a sequence of Unicode code points, which we can then convert to a Go string:
package main
import (
"fmt"
"unicode/utf16"
)
func main() {
var uint16s = []uint16{
8, 253, 80, 56, 30, 220, 217, 42, 235, 33, 211, 23, 231, 216, 234, 26,
}
runes := utf16.Decode(uint16s)
fmt.Println("word: ", string(runes))
}
(Note that I've change the type of the slice to []unit16
and renamed it accordingly. Also, I've decoded the source slice to an explicitly named variable; this is done for clarity—to highlight what's happening.)
This code produces the same gibberish as the JS code does in the Firefox console.
Update on the
2 results above are not the same for some empty characters.
bit which I did not touch.
The problem, as I understand it, is that your Go code prints something like
ýP8ÜÙ*ë!ÓçØê
while the JS code prints
�ýP8�ÜÙ*ë!Ó�çØê�
right?
The problem here is in the different interpretation of the resulting string fmt.Println
and console.log
do.
Let me first state that your Go code happens to work correctly without using proper decoding as I've suggested—because all the integers in the slice are UTF-16 code units in the "basic" range, so "dumb" conversion works, and produces the same string as the JS code does.
To see the both strings "as is" you could do this:
For Go, use fmt.Printf
with the %q
verb to see "special" Unicode (and ASCII) characters "escaped" using the Go rules in the printout:
fmt.Println("%q\n", string(int32s))
produces
"\býP8\x1eÜÙ*ë!Ó\x17çØê\x1a"
Notice these '\b', '\x1e' and other escapes:
As you can see, these are control characters, which are not printable.
For JS, print the value of the resulting string without using console.log
—just save its value in a variable then enter its name at the console and hit Enter—to have its value printed "as is":
> let int32s = [8, 253, 80, 56, 30, 220, 217, 42, 235, 33, 211, 23, 231, 216, 234, 26]
> str = String.fromCharCode.apply(null, int32s);
> str
"\u0008ýP8\u001eÜÙ*ë!Ó\u0017çØê\u001a"
Note that the string contains the "\uXXXX" escapes. They define Unicode code points (BTW Go supports the same syntax), and these escapes define the same code points as can be seen in the Go example:
As you can see, the strings produced are the same, with the only difference is that Go's string is encoded in UTF-8, and because of this, peering into its contents using fmt.Printf
and %q
looks at the encoded bytes, and that's why Go prints their "escapes" using "minimal" encoding, but we could use escaping from the JS example as well: you can check than running
fmt.Println("\býP8\x1eÜÙ*ë!Ó\x17çØê\x1a" == "\u0008ýP8\u001eÜÙ*ë!Ó\u0017çØê\u001a")
prints true
.
So, as you can see by now, console.log
replaces each non-printable character with the special Unicode code point U+FFFD, which is called Unicode replacement character, usually rendered as a black rhombus with a white question mark in it.
Go's fmt.Println
does not do that: it merely sends these bytes "as is" to the output.
Hope this explains the observed difference.