parsinghaskellabstract-syntax-treerecursive-descent

Understanding this implemented recursive-descent parser in Haskell


I had an assignment problem were I had to parse a tokenized prefix calculator notation on a pre-defined AST. We were given a pretty complete algorithm for parsing (we had to add some stuff).

The algorithm and AST I submitted is as follows:

data Ast = Tall Int | Sum Ast Ast | Mult Ast Ast | Min Ast | Var String  deriving (Eq, Show)

parseExpr :: [String] -> (Ast, [String])
parseExpr [] = error "Empty!"
parseExpr ("+":rs) = let (e1, r1) = parseExpr rs ;(e2, r2) = parseExpr r1 in (Sum e1 e2, r2)
parseExpr ("*":rs) = let (e1, r1) = parseExpr rs ;(e2, r2) = parseExpr r1 in (Mult e1 e2, r2)
parseExpr ("-":rs) = let (e1, r1) = parseExpr rs in (Min e1, r1)
parseExpr (x:rs)
        | all isDigit x = (Tall (read x), rs)
        | all isAlpha x = (Var x, rs)
        | otherwise = error ("Syntax errors "++x)

Example of input/output:

parseExpr ["+","4","*","8","-","5"] =
 (Sum (Tall 4) (Mult (Tall 8) (Min (Tall 5))),[])

What I needed in the continuation of my assignment was the first part of the tuple. This is correct, the assignment is handed in, and all is well for the purposes of the assignment.

Having said that, I don't know what is going on in the recursive function. Especially in these three lines:

parseExpr ("+":rs) = let (e1, r1) = parseExpr rs ;(e2, r2) = parseExpr r1 in (Sum e1 e2, r2)
parseExpr ("*":rs) = let (e1, r1) = parseExpr rs ;(e2, r2) = parseExpr r1 in (Mult e1 e2, r2)
parseExpr ("-":rs) = let (e1, r1) = parseExpr rs in (Min e1, r1)

Of course, I get the ("+":rs) notation. What I have difficulty understanding is the "let (e1, r1) = ....."etc.

And in the guards at the end, I don't see any recursive calls. Yet, recursion happens, right? How does this work?


Solution

  • Let's write one of the definitions out in a more conventional fashion:

    parseExpr ("+":rs) = let (e1, r1) = parseExpr rs -- recursive call #1
                             (e2, r2) = parseExpr r1 -- recursive call #2
                         in (Sum e1 e2, r2)
    

    Given a list that starts with a "+", we first recursively parse the tokens that follow the "+"; the resulting expression is assigned the name e1 and the unused suffix of rs is assigned the name r1. We repeat the process by parsing r1 to get an expression e2 and a remaining input r2. With that, we can construct the Sum value using the two subexpressions e1 and e2, and pass back r2 for our caller to parse.

    Using your example,

    -- Step 1
    parseExpr ["+","4","*","8","-","5"] 
      = let (e1, r1) = parseExpr ["4","*","8","-","5"]
            (e2, r2) = parseExpr r1
        in (Sum e1 e2, r2)
    

    Before we can do any more, we have to evaluate the first recursive call

    -- Step 2
    parseExpr ["4","*","8","-","5"] = (Tall 4, ["*","8","-","5"])
    

    Now we can plug that result back into step 1

    -- Step 3
    parseExpr ["+","4","*","8","-","5"] 
      = let (e1, r1) = (Tall 4, ["*","8","-","5"])
            (e2, r2) = parseExpr r1
        in (Sum e1 e2, r2)
    
    -- Step 4
    parseExpr ["+","4","*","8","-","5"] 
      = let (e2, r2) = parseExpr ["*","8","-","5"]
        in (Sum (Tall 4) e2, r2)
    

    And so forth...