haskellmonadsmonad-transformerscategory-theory

Why are monad transformers different to stacking monads?


In many cases, it isn't clear to me what is to be gained by combining two monads with a transformer rather than using two separate monads. Obviously, using two separate monads is a hassle and can involve do notation inside do notation, but are there cases where it just isn't expressive enough?

One case seems to be StateT on List: combining monads doesn't get you the right type, and if you do obtain the right type via a stack of monads like Bar (where Bar a = (Reader r (List (Writer w (Identity a))), it doesn't do the right thing.

But I'd like a more general and technical understanding of exactly what monad transformers are bringing to the table, when they are and aren't necessary, and why.

To make this question a little more focused:

  1. What is an actual example of a monad with no corresponding transformer (this would help illustrate what transformers can do that just stacking monads can't).
  2. Are StateT and ContT the only transformers that give a type not equivalent to the composition of them with m, for an underlying monad m (regardless of which order they're composed.)

(I'm not interested in particular implementation details as regards different choices of libraries, but rather the general (and probably Haskell independent) question of what monad transformers/morphisms are adding as an alternative to combining effects by stacking a bunch of monadic type constructors.)

(To give a little context, I'm a linguist who's doing a project to enrich Montague grammar - simply typed lambda calculus for composing word meanings into sentences - with a monad transformer stack. It would be really helpful to understand whether transformers are actually doing anything useful for me.)

Thanks,

Reuben


Solution

  • To answer you question about the difference between Writer w (Maybe a) vs MaybeT (Writer w) a, let's start by taking a look at the definitions:

    newtype WriterT w m a = WriterT { runWriterT :: m (a, w) }
    type Writer w = WriterT w Identity
    
    newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }
    

    Using ~~ to mean "structurally similar to" we have:

    Writer w (Maybe a)  == WriterT w Identity (Maybe a)
                        ~~ Identity (Maybe a, w)
                        ~~ (Maybe a, w)
    
    MaybeT (Writer w) a ~~ (Writer w) (Maybe a)
                        == Writer w (Maybe a)
                        ... same derivation as above ...
                        ~~ (Maybe a, w)
    

    So in a sense you are correct -- structurally both Writer w (Maybe a) and MaybeT (Writer w) a are the same - both are essentially just a pair of a Maybe value and a w.

    The difference is how we treat them as monadic values. The return and >>= class functions do very different things depending on which monad they are part of.

    Let's consider the pair (Just 3, []::[String]). Using the association we have derived above here's how that pair would be expressed in both monads:

    three_W :: Writer String (Maybe Int)
    three_W = return (Just 3)
    
    three_M :: MaybeT (Writer String) Int
    three_M = return 3
    

    And here is how we would construct a the pair (Nothing, []):

    nutin_W :: Writer String (Maybe Int)
    nutin_W = return Nothing
    
    nutin_M :: MaybeT (Writer String) Int
    nutin_M = MaybeT (return Nothing)   -- could also use mzero
    

    Now consider this function on pairs:

    add1 :: (Maybe Int, String) -> (Maybe Int, String)
    add1 (Nothing, w) = (Nothing w)
    add1 (Just x, w)  = (Just (x+1), w)
    

    and let's see how we would implement it in the two different monads:

    add1_W :: Writer String (Maybe Int) -> Writer String (Maybe Int)
    add1_W e = do x <- e
                 case x of
                   Nothing -> return Nothing
                   Just y  -> return (Just (y+1))
    
    add1_M :: MaybeT (Writer String) Int -> MaybeT (Writer String) Int
    add1_M e = do x <- e; return (e+1)
      -- also could use: fmap (+1) e
    

    In general you'll see that the code in the MaybeT monad is more concise.

    Moreover, semantically the two monads are very different...

    MaybeT (Writer w) a is a Writer-action which can fail, and the failure is automatically handled for you. Writer w (Maybe a) is just a Writer action which returns a Maybe. Nothing special happens if that Maybe value turns out to be Nothing. This is exemplified in the add1_W function where we had to perform a case analysis on x.

    Another reason to prefer the MaybeT approach is that we can write code which is generic over any monad stack. For instance, the function:

    square x = do tell ("computing the square of " ++ show x)
                  return (x*x)
    

    can be used unchanged in any monad stack which has a Writer String, e.g.:

    WriterT String IO
    ReaderT (WriterT String Maybe)
    MaybeT (Writer String)
    StateT (WriterT String (ReaderT Char IO))
    ...
    

    But the return value of square does not type check against Writer String (Maybe Int) because square does not return a Maybe.

    When you code in Writer String (Maybe Int), you code explicitly reveals the structure of monad making it less generic. This definition of add1_W:

    add1_W e = do x <- e 
                  return $ do 
                    y <- x 
                    return $ y + 1
    

    only works in a two-layer monad stack whereas a function like square works in a much more general setting.