cannot compile yesod, hGetContents invalid argument

I am trying to create a Yesod project and cannot get off the 1st step.

Below are the steps I took to initialize and build:

$ stack new someproj yesodweb/postgres
$ cd someproj
$ stack build

I have done no extra coding, just using the boilerplate, and I get the compilation error below when building.

language-javascript  > configure
language-javascript  > Configuring language-javascript-0.7.1.0...
language-javascript  > build
language-javascript  > Preprocessing library for language-javascript-0.7.1.0..
language-javascript  > happy: src/Language/JavaScript/Parser/Grammar7.y:
                       hGetContents: invalid argument (invalid byte sequence)
          .
          .
          .
--  While building package language-javascript-0.7.1.0 
    (scroll up to its section to see the error) using:
        /root/.stack/setup-exe-cache/x86_64-linux-tinfo6/Cabal-simple_mPHDZzAJ_3.0.1.0_ghc-8.8.4
        --builddir=.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.0.1.0 build
        --ghc-options " -fdiagnostics-color=always"
    Process exited with code: ExitFailure 1

I am on fedora, and I am using stack 2.7.3

I have been working on this for 2 days and still cannot compile it. Anybody with some insight on what the problem is? (Or maybe Haskell and Yesod are still immature and are just not ready for production?)

Solution

Well, let's try again...

Short answer: Before the stack build, set your encoding to en_US.UTF-8 or any other UTF-8-enabled encoding, either system-wide by editing /etc/locale.conf:

# /etc/locale.conf
LANG=en_US.UTF-8

or by setting it in the shell:

$ export LANG=en_US.UTF-8
$ cd someproj
$ stack build

Long explanation: It looks like you've run up against a bug for which no one ... will accept ... responsibility.

The file Grammar7.y in the package language-javascript contains a single UTF-8 character, so it needs to be read by the Happy parser using UTF-8 encoding. However, by default, Happy reads its input files using the system encoding from the locale, and the Happy folks think that's the correct behavior.

Both Cabal and Stack are supposed to reliably build source packages downloaded from the Internet on local machines, and so they ought to build packages by reading source files using the encoding of the source files, rather than the encoding from the local machine's locale.

For Haskell source files, there's no problem because GHC ignores any system encoding and reads all source files as UTF-8 encoding. This is the sensible thing for a compiler/interpreter/language-standard/etc. to do in our Internet-connected age -- either specify a fixed encoding for all source files (e.g., GHC, Python) or provide a mechanism to specify the encoding in the file itself (e.g., HTML, LaTeX, etc.).

Since Grammar7.y is a Happy file, and the Happy folks have chosen the non-sensible approach, it falls to either Stack or Cabal to ensure that -- during the build process -- non-Haskell source files are read with the correct encoding. They could do this by either decreeing that Stack and Cabal will always build packages with the system encoding overridden to UTF-8, or by providing some mechanism in the xxx.cabal file for specifying the package's encoding and setting it on a per-package basis. Unfortunately, neither group of developers wants to accept this responsibility.

So, we're stuck in an ridiculous situation where Stack can only reliably build packages when the system locale is configured for UTF-8 encoding, but the Stack developers think Stack shouldn't override the system locale during builds to ensure it's configured for UTF-8 encoding.