I am new to Haskell and learning how to use haskell writing a compiler. I rewrote those code in "Monadic Parser Combinators" using a monad syntax like below. Everything seemed working well but I got a confusing output of traceM
when I try to trace the execution of word
. Why did "leave do" appear three times but "enter do" only appeared once? (the ghc version is 9.8.2 and both ghc and ghci outputs the same)
code:
import Debug.Trace
newtype Parser a = Parser {parse :: String -> [(a, String)]}
zero :: Parser a
zero = Parser $ \_ -> []
item :: Parser Char
item = Parser $ \input -> case input of
[] -> []
x : xs -> [(x, xs)]
instance Functor Parser where
fmap f parser = Parser $ \input -> [(f result, input') | (result, input') <- parse parser input]
instance Applicative Parser where
pure x = Parser $ \input -> [(x, input)]
p1 <*> p2 = Parser $ \input -> [(mapper result, input'') | (mapper, input') <- parse p1 input, (result, input'') <- parse p2 input']
instance Monad Parser where
return = pure
parser >>= f = Parser $ \input -> concat [parse (f result) input' | (result, input') <- parse parser input]
sat :: (Char -> Bool) -> Parser Char
sat predi = item >>= \x -> if predi x then return x else zero
infixl 0 |:
(|:) :: Parser a -> Parser a -> Parser a
p1 |: p2 = Parser $ \input -> parse p1 input ++ parse p2 input
lower :: Parser Char
lower = sat $ \x -> x >= 'a' && x <= 'z'
upper :: Parser Char
upper = sat $ \x -> x >= 'A' && x <= 'Z'
letter :: Parser Char
letter = lower |: upper
word :: Parser String
word = return "" |: do
traceM "enter do"
x <- letter
traceM $ "x = " ++ show x
xs <- word
traceM $ "xs = " ++ show xs
traceM "leave do"
return (x:xs)
main = putStrLn $ show $ parse word "ab"
output of traceM
:
enter do
x = 'a'
xs = ""
leave do
x = 'b'
xs = ""
leave do
xs = "b"
leave do
my expected output of traceM
is like the following(and I'm not sure if I'm wrong):
enter do
x = "a"
enter do
x = "b"
xs = ""
leave do
xs = ""
xs = "b"
leave do
update: I know that <-
is not assignment which is actually a syntactic sugar of bind
. To be more accurate, I wonder why "enter do" is always printed once(no matter how long the length of input string is), since the word
function is definitely called many times.
update: Thanks for comment of @Naïm. I got the "satisfactory" trace info after inserting =<< pure
behind traceM
. Here is the trace info:
enter do
x = 'a'
xs = ""
leave do
enter do
x = 'b'
xs = ""
leave do
xs = "b"
leave do
enter do
I'm going to copy a section from an older answer I wrote.
The basic distinction you need to draw here is right here in this significantly simplified example:
> g = trace "a" $ \() -> trace "b" () > g () a b () > g () b ()
There is a separate notion of caching a function and caching its output. The latter is, simply, never done in GHC (though see discussion of what's going on with your optimized version below). The former may sound dumb, but it in fact is not so dumb as you might think; you could imagine writing a function which is, say,
id
if the collatz conjecture is true andnot
otherwise. In such a situation, it makes complete sense to only test the collatz conjecture once, and then cache whether we should behave asid
ornot
forever afterwards.
I suspect, though I am not certain, that something related is happening here. When you desugar your do
block, you can see that it is a similar situation here: some of your trace
s are inside a lambda, and some are not.
do
traceM "enter do" traceM "enter do" >> -- outside?
x <- letter ==> letter >>= \x ->
traceM $ "x = " ++ show x traceM ("x = " ++ show x) -- clearly inside
However, I think that's not the whole story. There's still something I don't understand going on here. The reason I say that is that your Monad
and Applicative
method implementations all put things under lambdas in a way that I would expect to cause the traceM
to be executed multiple times. I suspected that the optimizer was pulling the traceM ...
bit out, but -ddump-simpl
doesn't agree with me; we can see that traceM "enter do" >> letter
is pulled out as a separate subexpression, but my read of the core suggests that this should still execute traceM
once per invocation of word_rhN
. So I'm stumped at this point. Here's the chunk where the first part of your block has been pulled out, with the important parts italicized:
p2_r1pL :: Parser String [GblId] p2_r1pL = $c>>=_r1pf @() @String (traceM @Parser Main.$fApplicativeParser (GHC.CString.unpackCString# "enter do"#)) (\ _ [Occ=Dead] -> k_r1pK)
From the start of the definition of $c>>=_r1pf
, we can see that the evaluation of traceM
shouldn't be shared between separate evaluations of p2_r1pL
. I've elided some distracting bits.
$c>>=_r1pf :: forall a b. Parser a -> (a -> Parser b) -> Parser b
[...]
$c>>=_r1pf
= \ (@a_a1gY)
(@b_a1gZ)
(parser_axd :: Parser a_a1gY)
(f_axe :: a_a1gY -> Parser b_a1gZ) ->
$ @...
@...
@...
@...
((\ (ds_d1o8 :: ...) -> ds_d1o8)
`cast` ...
(\ (input_axf :: String) -> ...)
So even applied to two arguments as it is in p2_r1pL
, this should have a lambda outermost, and traceM
should appear only inside that lambda. ¯\_(ツ)_/¯