I'm trying to build a program that extract data from a JSON and put it into a custom struct. The JSON contains keys like "foo\u00a0", so I have to use tags to get these values.
I have this code:
package main
import (
"encoding/json"
"fmt"
)
type MyStruct struct {
X string `json:"foobar\u0062"`
Y string `json:"foobaz\u00a0"`
}
func main() {
data := []byte(`{"foobar\u0062": "Bar", "foobaz\u00a0": "Baz"}`)
var ms MyStruct
err := json.Unmarshal(data, &ms)
if err != nil {
panic(err)
}
fmt.Printf("First: %s\n", ms.X)
fmt.Printf("Second: %s\n", ms.Y)
}
But it prints:
First: Bar
Second:
It does not print the second value.
I tested it with different value from Latin 1 supplement and apparently,
00b5
, 00f9
00a1
, 00a2
, 00ab
, 00af
, 00b0
My questions:
foobar\u0062
be used as a tag but not foobaz\u00a0
?foobar\u00a0
in a JSON ?The struct tag allows including such special characters like \u00a0
, see this example to prove it:
type MyStruct struct {
X string `json:"foobar\u0062"`
Y string `json:"foobaz\u00a0"`
}
u := MyStruct{}
t := reflect.TypeOf(u)
for _, fieldName := range []string{"X", "Y"} {
field, found := t.FieldByName(fieldName)
if !found {
continue
}
fmt.Printf("\nField: %s\n", fieldName)
fmt.Printf("\tWhole tag value : %s\n", field.Tag)
fmt.Printf("\tValue of 'json': %q\n", field.Tag.Get("json"))
}
This outputs (try it on the Go Playground):
Field: X
Whole tag value : json:"foobar\u0062"
Value of 'json': "foobarb"
Field: Y
Whole tag value : json:"foobaz\u00a0"
Value of 'json': "foobaz\u00a0"
But the encoding/json
package is more strict and it does not allow such characters. The restriction is in encoding/json/encode.go
:
func isValidTag(s string) bool {
if s == "" {
return false
}
for _, c := range s {
switch {
case strings.ContainsRune("!#$%&()*+-./:;<=>?@[]^_{|}~ ", c):
// Backslash and quote chars are reserved, but
// otherwise any punctuation chars are allowed
// in a tag name.
case !unicode.IsLetter(c) && !unicode.IsDigit(c):
return false
}
}
return true
}
So the json
tag value of "foobar\u0062"
is valid because '\u0062'
is simply the 'b'
character which is allowed.
And a json
tag value of "foobaz\u00a0"
is deemed invalid ('\u00a0'
is not accepted by isValidTag()
) and will not be unmarshaled. This restriction is historical and was added so that a json
key can also be used for other purposes, such as protobuf
keys.
If you want to unmarshal such input JSON using the encoding/json
standard lib package, you can't use struct tags. Use a map
for example:
data := []byte(`{"foobar\u0062": "Bar", "foobaz\u00a0": "Baz"}`)
var m map[string]any
err := json.Unmarshal(data, &m)
if err != nil {
panic(err)
}
fmt.Println("X:", m["foobar\u0062"])
fmt.Println("Y:", m["foobaz\u00a0"])
This will output (try it on the Go Playground):
X: Bar
Y: Baz