haskellfunctional-programming

If we bind two parsers and the second parser fails, does the string get parsed once?


So suppose this is a parser:

data Parser a = MkParser (String -> Maybe (String, a))


runParser :: Parser a -> String -> Maybe a
runParser (MkParser sf) inp = case sf inp of
                                Nothing -> Nothing
                                Just (_, a) -> Just a


unParser :: Parser a -> String -> Maybe (String, a)
unParser (MkParser sf1) = sf1

(>>=) :: Parser a -> (a -> Parser b) -> Parser b
-- takes a parser, and a second parser (second parser takes a char)
-- parses `a`, then feeds `a` to second parser...
MkParser pa >>= k = MkParser sf
    where
        sf inp = case pa inp of
            Nothing -> Nothing
            Just (as, a) -> unParser (k a) as
-- Question: If the second parser fails, does the first
-- parser still parse? What string does it return?
-- as or inp?

-- parses any char
anyChar :: Parser Char
anyChar = MkParser sf
  where
    sf "" = Nothing
    sf (c:cs) = Just (cs, c)

-- parses the char we want
char :: Char -> Parser Char
char wanted = MkParser sf
  where
    sf (c:cs) | c == wanted = Just (cs, c)
    sf _ = Nothing

Suppose I do

secondFail = anyChar >>= \c -> char c
runParser secondFail "test"

This gives Nothing because the \c -> char c fails (e is not t).

My question is, did the first parser run? Did the input string get parsed 1 time? Is it "est" or is it the case that since the second parser failed, the whole thing failed and the string did not get parsed?

I'm not sure how to write code to test the output, since the output is just Nothing.

Edit: I think I answered my own question. I added this function (basically if the first parser succeeds, do nothing, but if the first parser fails, run the second parser:

(<|>) :: Parser a -> Parser a -> Parser a
-- Choice / Backtrack.
-- takes two parsers
-- tries one parse
-- pa1 stringInput
-- saves the location
-- if it fales, reverts back,
-- gives stringInput (no change) to 
-- pa2.
MkParser pa1 <|> pa2 = MkParser sf
    where
        sf inp = case pa1 inp of
            Nothing -> unParser pa2 inp
            j -> j

-- I want to be able to produce a parser that gives exactly what I want and
-- doesn't change the input string at all, doesn't touch it
pure :: a -> Parser a
pure a = MkParser (\inp -> Just (inp, a))

Added this to run parsers:

runParser2 :: Parser a -> String -> Maybe (String, a)
runParser2 (MkParser pa) inp = pa inp

And then I did this:

secondFail = (char 'w' >>= \a -> char 'o' >>= \b -> char 'r'  >>= \c -> char 'l') <|> pure 'x'

And then this to run it:

runParser2 secondFail "woqld"

This gives

Just ("woqld",'x')

Does it give Just ("woqld",'x') because (char 'w' >>= \a -> char 'o' >>= \b -> char 'r' >>= \c -> char 'l') is seen as just one parser? And even though the w and o parser succeeded, the whole parser failed since r failed? So it reverted to the original input and gave it to pure 'x'?


Solution

  • For the second parser char c to be able to start it must first be given the character that anyChar returns. So, the first parser must have succeeded if the second parser fails.

    You can test it like this:

    import Debug.Trace
    
    traceCurrentString :: Parser ()
    traceCurrentString = MkParser (\s -> traceShow s (Just (s, ())))
    
    secondFail = anyChar >>= \c -> traceCurrentString >>= \() -> char c
    

    Then secondFail produces this output:

    ghci> runParser secondFail "test"
    "est"
    Nothing
    

    Beware that tracing like this is not reliable and can be affected by optimizations, so you should only use it for debugging and even then you should remain skeptical of the results.


    Edit: Regarding your edit, you should instead use a parser like this:

    secondFail =
      char 'w' >>= (\a ->
        char 'o' >>= (\b ->
          char 'r'  >>= (\c -> char 'l' <|> pure '4')
            <|> pure '3')
        <|> pure '2')
      <|> pure '1'
    

    Then you'll see the result you expected:

    ghci> runParser2 secondFail "woqld"
    Just ("qld",'3')