haskellfunctional-programming

What's the difference between placing a parser in its "box" vs using it without it?


So this is my parser, and two functions to run a parser:

data Parser a = MkParser (String -> Maybe (String, a))


unParser :: Parser a -> String -> Maybe (String, a)
unParser (MkParser a) inp = a inp


runParser :: Parser a -> String -> Maybe a
runParser (MkParser pa) inp  = case pa inp of
    Nothing -> Nothing
    Just (cs, c) -> Just c

Now I have this parser, that parses a string twice with two different parses, and combined their answers with a function:

liftA2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftA2 op (MkParser pa) (MkParser pb) = MkParser sf
    where
        sf inp = case pa inp of
            Nothing -> Nothing
            Just (as, a) -> case pb as of
                Nothing -> Nothing
                Just (bs, b) -> Just (bs, op a b)

I also have this code to run a parser 0 or more times.

-- try pa, if it fails, try pa2
(<|>) :: Parser a -> Parser a -> Parser a
MkParser sf1 <|> p2 = MkParser g
  where
    g inp = case sf1 inp of
              Nothing -> unParser p2 inp
              j -> j        -- the Just case

-- pure does not change the string, and simply gives what the user wants
pure :: a -> Parser a
pure x = MkParser (\inp -> Just (inp, x))

many :: Parser a -> Parser [a]
many p = some p <|> pure []

some :: Parser a -> Parser [a]
some p = liftA2 (:) p (many p)

Let's use this basic parser for an example:

anyChar :: Parser Char
anyChar = MkParser f
    where
        f "" = Nothing
        f (c:cs) = Just (cs, c)

Now, if I run the parser, I get a stack overflow:

runParser (some anyChar) "11test"

I managed to fix it by changing liftA2 to this:

liftA2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftA2 op (MkParser sf1) p2 = MkParser g
  where
    g inp = case sf1 inp of
              Nothing -> Nothing
              Just (middle, a) ->
                  case unParser p2 middle of
                    Nothing -> Nothing
                    Just (rest, b) -> Just (rest, op a b)

But I don't fully understand Why that works. unParser simply wraps p2 in MkParser, and then feeds it the input string, right? Which is what I was doing in my original liftA2 anyways, because it took this parameter (MkParser pb). (I actually made unParser for a different reason but realized it fixed my issue here). What's the difference and how come I get a stack overflow with my original liftA2? Thank you!


Solution

  • The difference is the pattern match on (MkParser pb). This causes liftA2 (:) p (many p) to immediately evaluate many p, which recursively calls some p <|> pure [] and so on, to produce a MkParser (because your <|> is also strict in its first argument).

    By taking p2 and only later unParsering it, liftA2 can return a new parser that can run sf1 and lazily only construct the p2 to run afterwards if that is actually needed.