haskellmonadstrace

Confused output of `traceM` in recursive call on function monad


I am new to Haskell and learning how to use haskell writing a compiler. I rewrote those code in "Monadic Parser Combinators" using a monad syntax like below. Everything seemed working well but I got a confusing output of traceM when I try to trace the execution of word. Why did "leave do" appear three times but "enter do" only appeared once? (the ghc version is 9.8.2 and both ghc and ghci outputs the same)

code:

import Debug.Trace
newtype Parser a = Parser {parse :: String -> [(a, String)]}

zero :: Parser a
zero = Parser $ \_ -> []

item :: Parser Char
item = Parser $ \input -> case input of
                                [] -> []
                                x : xs -> [(x, xs)]

instance Functor Parser where
  fmap f parser = Parser $ \input -> [(f result, input') | (result, input') <- parse parser input]

instance Applicative Parser where
  pure x = Parser $ \input -> [(x, input)]
  p1 <*> p2 = Parser $ \input -> [(mapper result, input'') | (mapper, input') <- parse p1 input, (result, input'') <- parse p2 input']

instance Monad Parser where
  return = pure
  parser >>= f = Parser $ \input -> concat [parse (f result) input' | (result, input') <- parse parser input]

sat :: (Char -> Bool) -> Parser Char
sat predi = item >>= \x -> if predi x then return x else zero

infixl 0 |:
(|:) :: Parser a -> Parser a -> Parser a
p1 |: p2 = Parser $ \input -> parse p1 input ++ parse p2 input 

lower :: Parser Char
lower = sat $ \x -> x >= 'a' && x <= 'z'

upper :: Parser Char
upper = sat $ \x -> x >= 'A' && x <= 'Z'

letter :: Parser Char
letter = lower |: upper

word :: Parser String
word = return "" |: do
  traceM "enter do"
  x <- letter
  traceM $ "x = " ++ show x
  xs <- word
  traceM $ "xs = " ++ show xs
  traceM "leave do" 
  return (x:xs)
main = putStrLn $ show $ parse word "ab"

output of traceM:

enter do
x = 'a'
xs = ""
leave do
x = 'b'
xs = ""
leave do
xs = "b"
leave do

my expected output of traceM is like the following(and I'm not sure if I'm wrong):

enter do
x = "a"
enter do
x = "b"
xs = ""
leave do
xs = ""
xs = "b"
leave do

update: I know that <- is not assignment which is actually a syntactic sugar of bind. To be more accurate, I wonder why "enter do" is always printed once(no matter how long the length of input string is), since the word function is definitely called many times.

update: Thanks for comment of @Naïm. I got the "satisfactory" trace info after inserting =<< pure behind traceM. Here is the trace info:

enter do
x = 'a'
xs = ""
leave do
enter do
x = 'b'
xs = ""
leave do
xs = "b"
leave do
enter do

Solution

  • I'm going to copy a section from an older answer I wrote.

    The basic distinction you need to draw here is right here in this significantly simplified example:

    > g = trace "a" $ \() -> trace "b" ()
    > g ()
    a
    b
    ()
    > g ()
    b
    ()
    

    There is a separate notion of caching a function and caching its output. The latter is, simply, never done in GHC (though see discussion of what's going on with your optimized version below). The former may sound dumb, but it in fact is not so dumb as you might think; you could imagine writing a function which is, say, id if the collatz conjecture is true and not otherwise. In such a situation, it makes complete sense to only test the collatz conjecture once, and then cache whether we should behave as id or not forever afterwards.

    I suspect, though I am not certain, that something related is happening here. When you desugar your do block, you can see that it is a similar situation here: some of your traces are inside a lambda, and some are not.

    do
      traceM "enter do"             traceM "enter do" >> -- outside?
      x <- letter               ==> letter >>= \x ->
      traceM $ "x = " ++ show x     traceM ("x = " ++ show x) -- clearly inside
    

    However, I think that's not the whole story. There's still something I don't understand going on here. The reason I say that is that your Monad and Applicative method implementations all put things under lambdas in a way that I would expect to cause the traceM to be executed multiple times. I suspected that the optimizer was pulling the traceM ... bit out, but -ddump-simpl doesn't agree with me; we can see that traceM "enter do" >> letter is pulled out as a separate subexpression, but my read of the core suggests that this should still execute traceM once per invocation of word_rhN. So I'm stumped at this point. Here's the chunk where the first part of your block has been pulled out, with the important parts italicized:

    p2_r1pL :: Parser String
    [GblId]
    p2_r1pL
      = $c>>=_r1pf
          @()
          @String
          (traceM
             @Parser
             Main.$fApplicativeParser
             (GHC.CString.unpackCString# "enter do"#))
          (\ _ [Occ=Dead] -> k_r1pK)
    

    From the start of the definition of $c>>=_r1pf, we can see that the evaluation of traceM shouldn't be shared between separate evaluations of p2_r1pL. I've elided some distracting bits.

    $c>>=_r1pf :: forall a b. Parser a -> (a -> Parser b) -> Parser b
    [...]
    $c>>=_r1pf
      = \ (@a_a1gY)
          (@b_a1gZ)
          (parser_axd :: Parser a_a1gY)
          (f_axe :: a_a1gY -> Parser b_a1gZ) ->
          $ @...
            @...
            @...
            @...
            ((\ (ds_d1o8 :: ...) -> ds_d1o8)
             `cast` ...
            (\ (input_axf :: String) -> ...)
    

    So even applied to two arguments as it is in p2_r1pL, this should have a lambda outermost, and traceM should appear only inside that lambda. ¯\_(ツ)_/¯