xmlgounmarshallingnull-character

xml.Unmarshal - parsing null character entity yields error: illegal character code U+0000


I know this is kinda weird thing to do, but it is part if an integration test, where I am testing whether system does not get tricked by \0, sent as URL encoded %%0. The system is indeed not tricked and echoes the value back as an XML error. But my test code fails then, trying to parse it. This is IMO valid XML:

     <Error>
        <Code>1234</Code>
        <Message>Test&#0;afterzero</Message>
     </Error>

But this code would have an error, instead of a string with \0 in it:

package main

import (
    "encoding/xml"
    "fmt"
)

type ErrorMsg struct {
    XMLName xml.Name `xml:"Error"`
    Code    string   `xml:"Code"`
    Message string   `xml:"Message"`
}

func main() {

    var data = []byte(`
          <Error>
            <Code>1234</Code>
            <Message>Test&#0;afterzero</Message>
          </Error>
       `)

    b := ErrorMsg{}
    o := xml.Unmarshal([]byte(data), &b)
    fmt.Println(o)
    fmt.Println(b)
}

See the example here: https://go.dev/play/p/1fKyuzZPUM8

Can I do something? I suppose I could just scrap this test - I did check it doesn't crash the system and it's really weird corner case, but I kinda would like to keep it.

Can I tell the XML to accept this somehow?


Solution

  • This is IMO valid XML

    That is NOT valid XML. The &#0; character is not a valid XML character and cannot be in an XML document. Any XML parser will fail to parse.

    Refer to the specs: https://www.w3.org/TR/xml/#charsets

    Character Range

    [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

    That character is not allowable in XML. So, you may need to find a different way to indicate what the character is if it's outside the allowable range.