haskellhaskell-snap-frameworkheist

How to show utf8 text with snap and heist?


I have used writeBS writeText from Snap and renderTemplate from heist but none of them seems to support unicode.

site :: Snap ()
site = do
    ifTop (writeBS "你好世界") <|>
    route [("test", testSnap)]

testSnap :: Snap ()
testSnap = do
     fromJust $ C.renderTemplate hs "test"

-- test.tpl

你好世界

I expected it to output "你好世界" for the / or /test route, but in fact its output is just some messy code.


Solution

  • The problem here is not with writeBS or writeText. It's with the conversion used by the OverloadedStrings extension. It is also important to understand the distinction between ByteString and Text. ByteString is for raw bytes. There is no concept of characters or an encoding. That is where Text comes in. The Data.Text.Encoding module has a bunch of functions for converting between Text and ByteString using different encodings. For me, both of the following generate the same output:

    writeBS $ encodeUtf8 "你好世界"
    writeText "你好世界"
    

    The reason your code didn't work is because your string literal is being converted to ByteString by the OverloadedStrings extension, and it is not giving you the behavior you want. The solution is to treat it as the proper type...Text.

    On the Heist side of things, the following works fine for me:

    route [("test", cRender "test")]
    

    In fact, this one renders correctly in my browser, while the previous two don't. The difference is that cRender sets an appropriate content-type. I found it enlightening to observe the differences using the following snippet.

    site = route [ ("/test1", writeBS "你好世界")
                 , ("/test2", writeBS $ encodeUtf8 "你好世界")
                 , ("/test3", writeText "你好世界")
                 , ("/test4", modifyResponse (setContentType "text/html;charset=utf-8") >> writeText "你好世界")
                 , ("/testHeist", cRender "test")
                 ]
    

    In my browser test4 and testHeist work correctly. Tests 2 and 3 give you the correct behavior but might not be rendered properly by browsers because of the lack of content-type.