Using HXT I'm parsing the XML response body of an HTTP call that was made using http-conduit.
val <- runX $ readString [withValidate no] (Data.ByteString.UTF8.toString . toStrict $ getResponseBody response) >>> getChildren >>> ...
Depending on the version of the API, I found that the response body includes a byte order mark before the XML:
error: ""\65279<?xml version=\"1.0\" encoding=\"utf-8\"?><Enume..."" (line 1, column 1):
unexpected "\65279"
expecting xml declaration, comment, processing instruction, "<!DOCTYPE" or "<"
Since the BOM may or may not be there, I did the following:
...
let resBody = Data.ByteString.UTF8.toString . toStrict $ getResponseBody response
parseBody body = runX $ readString [withValidate no] body >>> getChildren >>> ...
xs <- parseBody resBody
val <- case xs of
x : _ -> pure x
_ -> head <$> (parseBody $ drop 1 resBody)
...
It works, but it's printing the error message when the BOM is present. What are the options for parsing the XML with a possible BOM so that it's not printing error messages?
Okay, given that you're willing to assume the encoding is UTF-8 as you do here, then probably the simplest is to just pattern match to discard a BOM:
case toString ... of
'\65279':s -> s
s -> s
As an aside, having just looked through the XML spec to see how encodings are supposed to be handled, let me just say: eugh, gross. There appears to be no encoding-agnostic way to specify what encoding to use, so the only really correct, robust thing to do is try a bunch during parsing and hope one succeeds.