haskellconduithaskell-pipeshttp-conduitlazy-io

Why doesn't print force entire lazy IO value?


I'm using http-client tutorial to get response body using TLS connection. Since I can observe that print is called by withResponse, why doesn't print force entire response to the output in the following fragment?

withResponse request manager $ \response -> do
    putStrLn $ "The status code was: " ++
    body <- (responseBody response)
    print body

I need to write this instead:

response <- httpLbs request manager

putStrLn $ "The status code was: " ++
           show (statusCode $ responseStatus response)
print $ responseBody response

Body I want to print is a lazy ByteString. I'm still not sure whether I should expect print to print the entire value.

instance Show ByteString where
    showsPrec p ps r = showsPrec p (unpackChars ps) r

Solution

  • This doesn't have to do with laziness, but with the difference between the Response L.ByteString you get with the Simple module, and the Response BodyReader you get with the TLS module.

    You noticed that a BodyReader is an IO ByteString. But in particular it is an action that can be repeated, each time with the next chunk of bytes. It follows the protocol that it never sends a null bytestring except when it's at the end of file. (BodyReader might have been called ChunkGetter). bip below is like what you wrote: after extracting the BodyReader/IO ByteString from the Response, it performs it to get the first chunk, and prints it. But doesn't repeat the action to get more - so in this case we just see the first couple chapters of Genesis. What you need is a loop to exhaust the chunks, as in bop below, which causes the whole King James Bible to spill into the console.

    {-# LANGUAGE OverloadedStrings #-} 
    import Network.HTTP.Client
    import Network.HTTP.Client.TLS
    import qualified Data.ByteString.Char8 as B
    
    main = bip
    -- main = bop
    
    bip = do 
      manager <- newManager tlsManagerSettings
      request <- parseRequest "https://raw.githubusercontent.com/michaelt/kjv/master/kjv.txt"
      withResponse request manager $ \response -> do
          putStrLn "The status code was: "  
          print (responseStatus response)
          chunk  <- responseBody response
          B.putStrLn chunk
    
    bop = do 
      manager <- newManager tlsManagerSettings
      request <- parseRequest "https://raw.githubusercontent.com/michaelt/kjv/master/kjv.txt"
      withResponse request manager $ \response -> do
          putStrLn "The status code was: " 
          print (responseStatus response)
          let loop = do 
                chunk <- responseBody response
                if B.null chunk 
                  then return () 
                  else B.putStr chunk  >> loop 
          loop
    

    The loop keeps going back to get more chunks until it gets an empty string, which represents eof, so in the terminal it prints through to the end of the Apocalypse.

    This is behavior is straightforward but slightly technical. You can only work with a BodyReader by hand-written recursion. But the purpose of the http-client library is to make things like http-conduit possible. There the result of withResponse has the type Response (ConduitM i ByteString m ()). ConduitM i ByteString m () is how conduit types of a byte stream; this byte stream would contain the whole file.

    In the original form of the http-client/http-conduit material, the Response contained a conduit like this; the BodyReader part was later factored out into http-client so it could be used by different streaming libraries like pipes.

    So to take a simple example, in the corresponding http material for the streaming and streaming-bytestring libraries, withHTTP gives you a response of type Response (ByteString IO ()). ByteString IO () is the type of a stream of bytes arising in IO, as its name suggests; ByteString Identity () would be the equivalent of a lazy bytestring (effectively a pure list of chunks.) The ByteString IO () will in this case represent the whole bytestream down to the Apocalypse. So with the imports

     import qualified Data.ByteString.Streaming.HTTP as Bytes -- streaming-utils
     import qualified Data.ByteString.Streaming.Char8 as Bytes -- streaming-bytestring
    

    the program is identical to a lazy bytestring program:

    bap = do 
        manager <- newManager tlsManagerSettings
        request <- parseRequest "https://raw.githubusercontent.com/michaelt/kjv/master/kjv.txt"
        Bytes.withHTTP request manager $ \response -> do 
            putStrLn "The status code was: "
            print (responseStatus response)
            Bytes.putStrLn $ responseBody response
    

    Indeed it is slightly simpler, since you don't have "extract the bytes from IO`:

            lazy_bytes <- responseStatus response
            Lazy.putStrLn lazy_bytes
    

    but just write

            Bytes.putStrLn $ responseBody response
    

    you just "print" them directly. If you want to view just a bit from the middle of the KJV, you can instead do what you would with a lazy bytestring, and end with:

            Bytes.putStrLn $ Bytes.take 1000 $ Bytes.drop 50000 $ responseBody response
    

    Then you will see something about Abraham.

    The withHTTP for streaming-bytestring just hides the recursive looping that we needed to use the BodyReader material from http-client directly. It's the same e.g. with the withHTTP you find in pipes-http, which represents a stream of bytestring chunks as Producer ByteString IO (), and the same with http-conduit. In all of these cases, once you have your hands on the byte stream you handle it in the ways typical of the streaming IO framework without handwritten recursion. All of them use the BodyReader from http-client to do this, and this was the main purpose of the library.