jsonparsinghaskellutf-8aeson

How can I use Aeson to decode JSON files with Unicode characters?


(Apologies if I should say UTF8 instead of Unicode.)

I've already come across this question, but couldn't make reason out of it after several minutes.

The starting point of that and my question is the following

import Data.Aeson
import qualified Data.Text.Encoding as E (encodeUtf8)
import qualified Data.ByteString.Lazy.Char8 as C8 (pack)
import qualified Data.Text as T (pack)

main :: IO ()
main = do
  print (foo "{ \"foo\": \"bar\"}" :: Maybe Value)
  print (foo "{ \"foo\": \"bòz\"}" :: Maybe Value)
  where
    foo = decode . C8.pack

will print

Just (Object (fromList [("foo",String "bar")]))
Nothing

The answer and the comments allude to the solution making use of encodeUtf8 from Data.Text.Encoding. But how?

so I think that if I want to use E.encodeUtf8, I need to drop C8.pack for a function of type String -> Text, and T.pack seems to be such a function. However, the following,

import Data.Aeson
import qualified Data.Text.Encoding as E (encodeUtf8)
import qualified Data.ByteString.Lazy.Char8 as C8 (pack)
import qualified Data.Text as T (pack)

main :: IO ()
main = do
  print (foo "{ \"foo\": \"bar\"}" :: Maybe Value)
  print (foo "{ \"foo\": \"bòz\"}" :: Maybe Value)
  where
    foo = decode . E.encodeUtf8 . T.pack

fails to compile, with error

    • Couldn't match type ‘bytestring-0.11.5.3:Data.ByteString.Internal.Type.ByteString’
                     with ‘Data.ByteString.Lazy.Internal.ByteString’
      Expected: Data.Text.Internal.Text
                -> Data.ByteString.Lazy.Internal.ByteString
        Actual: Data.Text.Internal.Text
                -> bytestring-0.11.5.3:Data.ByteString.Internal.Type.ByteString
      NB: ‘Data.ByteString.Lazy.Internal.ByteString’
            is defined in ‘Data.ByteString.Lazy.Internal’
          ‘bytestring-0.11.5.3:Data.ByteString.Internal.Type.ByteString’
            is defined in ‘Data.ByteString.Internal.Type’
    • In the first argument of ‘(.)’, namely ‘E.encodeUtf8’
      In the second argument of ‘(.)’, namely ‘E.encodeUtf8 . T.pack’
      In the expression: decode . E.encodeUtf8 . T.pack
   |
11 |     foo = decode . E.encodeUtf8 . T.pack
   |                    ^^^^^^^^^^^^

Solution

  • The error message is saying tha encodeUtf8 produces a strict bytestring while decode expects a lazy bytestring. You can probably fix it by using the lazy version, for example by changing the imports:

    import qualified Data.Text.Lazy.Encoding as E (encodeUtf8)
    import qualified Data.Text.Lazy as T (pack)