I'm trying to use Data.Aeson (https://hackage.haskell.org/package/aeson-0.6.1.0/docs/Data-Aeson.html) to decode some JSON strings, however it is failing to parse strings that contain non-standard characters.
As an example, the file:
import Data.Aeson
import Data.ByteString.Lazy.Char8 (pack)
test1 :: Maybe Value
test1 = decode $ pack "{ \"foo\": \"bar\"}"
test2 :: Maybe Value
test2 = decode $ pack "{ \"foo\": \"bòz\"}"
When run in ghci, gives the following results:
*Main> :l ~/test.hs
[1 of 1] Compiling Main ( /Users/ltomlin/test.hs, interpreted )
Ok, modules loaded: Main.
*Main> test1
Just (Object fromList [("foo",String "bar")])
*Main> test2
Nothing
Is there a reason that it doesn't parse the String with the unicode character? I was under the impression that Haskell was pretty good with unicode. Any suggestions would be greatly appreciated!
Thanks,
tetigi
Upon further investigation using eitherDecode
, I get the following error message:
*Main> test2
Left "Failed reading: Cannot decode byte '\\x61': Data.Text.Encoding.decodeUtf8: Invalid UTF-8 stream"
x61
is the unicode character for 'z', which comes right after the special unicode character. Not sure why it's failing to read the characters after the special character!
Changing test2
to be test2 = decode $ pack "{ \"foo\": \"bòz\"}"
instead gives the error:
Left "Failed reading: Cannot decode byte '\\xf2': Data.Text.Encoding.decodeUtf8: Invalid UTF-8 stream"
Which is the character for "ò", which makes a bit more sense.
The problem is your usage of pack from the Char8 module, which doesn't work with non-Latin 1 data. Instead, use encodeUtf8
from text.
You can write your examples like this:
import Data.Aeson
import Data.Text.Lazy (pack)
import Data.Text.Lazy.Encoding (encodeUtf8)
test1 :: Maybe Value
test1 = decode $ encodeUtf8 $ pack "{ \"foo\": \"bar\"}"
test2 :: Maybe Value
test2 = decode $ encodeUtf8 $ pack "{ \"foo\": \"bòz\"}"