haskelllazy-io

reading files with references to other files in haskell


I am trying to expand regular markdown with the ability to have references to other files, such that the content in the referenced files is rendered at the corresponding places in the "master" file.

But the furthest I've come is to implement

createF :: FTree -> IO String
createF Null = return ""
createF (Node f children) = ifNExists f (_id f)
                              (do childStrings <- mapM createF children
                                  withFile (_path f) ReadMode $ \handle ->
                                    do fc <- lines <$> hGetContents handle
                                       return $ merge fc childStrings)

ifNExists is just a helper that can be ignored, the real problem happens in the reading of the handle, it just returns the empty string, I assume this is due to lazy IO.

I thought that the use of withFile filepath ReadMode $ \handle -> {-do stutff-}hGetContents handle would be the right solution as I've read fcontent <- withFile filepath ReadMode hGetContents is a bad idea.

Another thing that confuses me is that the function

createFT :: File -> IO FTree
createFT f = ifNExists f Null
               (withFile (_path f) ReadMode $ \handle ->
                  do let thisParse = fparse (_id f :_parents f)
                     children <-rights . map ( thisParse . trim) . lines <$> hGetContents handle
                     c <- mapM createFT children
                     return $ Node f c)

works like a charm.

So why does createF return just an empty string?

the whole project and a directory/file to test can be found at github


Here are the datatype definitions

type ID = String

data File = File {_id :: ID, _path :: FilePath, _parents :: [ID]}
          deriving (Show)
data FTree = Null
           | Node { _file :: File
                  , _children :: [FTree]} deriving (Show)

Solution

  • As you suspected, lazy IO is probably the problem. Here's the (awful) rule you have to follow to use it properly without going totally nuts:

    A withFile computation must not complete until all (lazy) I/O required to fully evaluate its result has been performed.

    If something forces I/O after the handle is closed, you are not guaranteed to get an error, even though that would be very nice. Instead, you get completely undefined behavior.

    You break this rule with return $ merge fc childStrings, because this value is returned before it's been fully evaluated. What you can do instead is something vaguely like

    let retVal = merge fc childStrings
    deepSeq retVal $ return retVal
    

    An arguably cleaner alternative is to put all the rest of the code that relies on those results into the withFile argument. The only real reason not to do that is if you do a bunch of other work with the results after you're finished with that file. For example, if you're processing a bunch of different files and accumulating their results, then you want to be sure to close each of them when you're done with it. If you're just reading in one file and then acting on it, you can leave it open till you're finished.


    By the way, I just submitted a feature request to the GHC team to see if they might be willing to make these kinds of programs more likely to fail early with useful error messages.


    Update

    The feature request was accepted, and such programs are now much more likely to produce useful error messages. See What caused this "delayed read on closed handle" error? for details.