In go, when I use json.Marshal on []byte & then json.Unmarshal inside a []byte I will get the same []byte that I used in input.
But when I json.Unmarshal inside an interface{} I will get a string.
Example here: https://goplay.tools/snippet/5BfFZ-Uq507
I've read json.Unmarshal documentation (https://pkg.go.dev/encoding/json#Unmarshal) & this issue https://github.com/golang/go/issues/16815.
I understand that []byte and string are not the same type and that it's logical to have a different result then string([]byte("BOOKS")) if I tried to json.Unmarshal inside a string.
But since I unmarshaled into interface{}, I expected the type to be []byte and to have my original []byte back not a string.
This is a problem for me because I can't make a difference, when unmarshalling data into map[string]interface{}, between what was originally a string or a []byte.
Example: https://goplay.tools/snippet/MVSR7_MvSv-
Is there any way to solve my issue ?
I initially left a comment because this seemed like a trivial issue, although the questions you're asking and things you mention suggest that there's actually a fair few things to unpack.
[]byte
not a string.What are these types, let's start with that. As per spec, the byte
type is an alias for uint8
A string is effectively a sequence of bytes, so therefore a string is a sequence of uint8
values. It is its own type, but let's take a closer look:
A string type represents the set of string values. A string value is a (possibly empty) sequence of bytes. The number of bytes is called the length of the string and is never negative. Strings are immutable: once created, it is impossible to change the contents of a string. The predeclared string type is string; it is a defined type.
With this in mind, you can see that a string can be copied and cast safely to a []byte
, but the main difference here is that a string
is immutable, whereas a []byte
is not:
s := "this is an immutable string"
cpy := []byte(s)
cpy[0] = 'T'
s2 := string(cpy)
fmt.Printf("%s != %s\n", s, s2)
This is all to say that, for the purposes of marshalling something, the input is immutable, and therefore there is no difference between []byte
and string
.
Cool, but didn't I just say that []byte
is an alias for []uint8
. Correct, so at this point you'd still expect []byte
to be encoded as [1, 2, 3, 4, ...]
. So let's take a look at the source code of the encoding/json
package, in particular this line stands out
func newSliceEncoder(t reflect.Type) encoderFunc {
// Byte slices get special treatment; arrays don't.
if t.Elem().Kind() == reflect.Uint8 {
p := reflect.PointerTo(t.Elem())
if !p.Implements(marshalerType) && !p.Implements(textMarshalerType) {
return encodeByteSlice
}
}
enc := sliceEncoder{newArrayEncoder(t)}
return enc.encode
}
Notice the comment: Byte slices get special treatment, which returns encodeByteSlice
as an encoderFunc
. Clearly, we are returning a different encoder callback when dealing with a slice of bytes, so let's look at what that encoder function looks like...
func encodeByteSlice(e *encodeState, v reflect.Value, _ encOpts) {
if v.IsNil() {
e.WriteString("null")
return
}
s := v.Bytes()
encodedLen := base64.StdEncoding.EncodedLen(len(s))
e.Grow(len(`"`) + encodedLen + len(`"`))
// TODO(https://go.dev/issue/53693): Use base64.Encoding.AppendEncode.
b := e.AvailableBuffer()
b = append(b, '"')
base64.StdEncoding.Encode(b[len(b):][:encodedLen], s)
b = b[:len(b)+encodedLen]
b = append(b, '"')
e.Write(b)
}
And there we have it: a byte slice is handled specifically to write the values to the buffer delimited by "
, meaning the values will be encoded as a JSON string. Just like that, we can perfectly explain the behaviour you've observed:
[]byte
, which is valid[]uint8
) is treated as a special caseNow when it comes to unmarshalling, what's going on with your var dataAny any
case? Well, let's look at the source code for the unmarshalling, specifically this part
case '"': // string
s, ok := unquoteBytes(item)
if !ok {
if fromQuoted {
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
}
panic(phasePanicMsg)
}
switch v.Kind() {
default:
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
case reflect.Slice:
if v.Type().Elem().Kind() != reflect.Uint8 {
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
break
}
b := make([]byte, base64.StdEncoding.DecodedLen(len(s)))
n, err := base64.StdEncoding.Decode(b, s)
if err != nil {
d.saveError(err)
break
}
v.SetBytes(b[:n])
case reflect.String:
if v.Type() == numberType && !isValidNumber(string(s)) {
return fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", item)
}
v.SetString(string(s))
case reflect.Interface:
if v.NumMethod() == 0 {
v.Set(reflect.ValueOf(string(s)))
} else {
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
}
}
This covers both of your unmarshal cases quite nicely. The JSON encoded input starts with a "
, so we enter the case that deals with unmarshalling strings. We get the data from the input minus the quotes as a slice of bytes (unquotedBytes()
). Next, we check what type the destination (v
) for the unmarshalled data is. We accept 3 types:
uint8
, we return an error (meaning we only really accept []byte
)any
, or interface typeIf the destination is of type any
, we do a quick check to make sure that the underlying type truly is an empty interface (ie we're not trying to write data to something other than a literal empty interface), and if so, we call
v.Set(reflect.ValueOf(string(s)))
We explicitly set its value to a string, because we are unmarshalling a string.
When the destination is a []byte
, we end up using v.SetBytes(b[:n])
, so we copy the values over to a byte slice. Simple as can be.
Now what you're actually looking for is a way to ensure that what is marshalled as a []byte
is unmarshalled as a []byte
. From the code above, it should be fairly obvious by now that this can't be done. you can force something like this by converting your []byte
to an []int
:
s := "foobar"
si := make([]int, 0, len(s))
for _, c := range []byte(s) {
si = append(si, int(c))
}
But that makes the marshalled data very silly. It's only really useful if both parties involved in the data-exchange know what to do with slices/arrays of numbers, and there are no cases where you actually want to send a slice of numeric values that shouldn't be interpreted as a string
This all leads in to the last point of note: you mentioned marshalling the data into a map[string]any
. This makes me think we're dealing with an X-Y problem here.
Sometimes, you're needing to unmarshal data which you can't know the type of (usually data you need to pass on to some other process that will be able to identify what the data means, and how to process it). In those (rare) cases, using a map[string]any
can be a useful validation step to make sure you're not sending malformed payloads to that other process.
However your trying to force a string to be represented as a []byte
suggests you're very much aware of what data you're dealing with, how it ought to be represented, what it means, and which fields need to be handled in this particular way. If that is the case: why bother with the whole map[string]any
mess? For that to work/be used, you'll have to litter your code with hard-coded keys for the map to extract the bits of data you need. Just create a type, that implements the JSONMarshal and JSONUnmarshal methods, and you can handle specific fields in specific ways. You could, even though I'm still finding it impossible to think of a valid reason for it, convert strings to int slices and back again in the marshalling process