error-handlingf#grammarfparsec

Error position in recursive FParsec expressions


My grammar contains expressions characterized by an identifier that only optionally is followed by a parenthesized list of expressions.

My problem is that Fparsec will show an "unintuitive" position on syntax errors when the error occurs in a nested expression.

For an example, consider this simple parser

let x = skipChar 'x' .>> spaces >>% Node.X 
let y = skipChar 'y' .>> spaces >>% Node.Y
let c = skipChar ',' .>> spaces 
let left = pchar '(' .>> spaces 
let right = pchar ')' .>> spaces 
let expr, exprRef = createParserForwardedToRef()
let paramList, paramListRef = createParserForwardedToRef()
let paramTuple = left >>. paramList .>> right 


let xOrY = choice [x ; y] 
let exprWithArgs = xOrY .>>. paramTuple |>> Node.Expr

exprRef.Value <- choice [ attempt exprWithArgs ; xOrY ] 
paramListRef.Value <- sepBy1 expr c

let parser = expr .>> eof
let result = run parser "x(y, y, y(x, y, y(x,y)) )"

printf "\nParsing correct:\n%O" result 

let resultWithError = run parser "x(y, y, y(x, y, y(x,z)) )"
printf "\n\nParsing error:\n%O" resultWithError

The output of the program is:

Parsing correct:
Success: Expr (X, [Y; Y; Expr (Y, [X; Y; Expr (Y, [X; Y])])])

Parsing error:
Failure:
Error in Ln: 1 Col: 2
x(y, y, y(x, y, y(x,z)) )
 ^
Expecting: end of input

The more nested expressions I have, the more difficult it becomes to find the actual erroneous position.

Is there a way to change my grammar (or to use/configure) FParsec in such a way that the error becomes better to find? I wished, I would get an intuitive error message like this:

Parsing correct:
Success: Expr (X, [Y; Y; Expr (Y, [X; Y; Expr (Y, [X; Y])])])

Parsing error:
Failure:
Error in Ln: 1 Col: 21
x(y, y, y(x, y, y(x,z)) )
                    ^
Expecting: 'x' or 'y'


Solution

  • Let's use "y(x,z)" as an example input. The problem occurs in the expr/exprRef parser. First, it attempts to parse exprWithArgs, which fails because of the z. The parser backtracks (because of attempt), and tries to parse xOrY instead, which succeeds on the y. But the next char is ( instead of eof, so parser then fails at that position with an unhelpful message.

    Since exprWithArgs starts by parsing xOrY, we can factor that out, leading to this solution:

    exprRef.Value <-
        parse {
            let! xy = xOrY
            match! opt paramTuple with
                | Some args -> return Node.Expr (xy, args)
                | None -> return xy
        }
    

    This has the advantage of not backtracking, so the parser always moves forward until it can't proceed further, resulting in a clearer error message. Note that exprWithArgs is no longer used.

    If you don't like the computation builder, you can do this instead:

    exprRef.Value <-
        pipe2
            xOrY
            (opt paramTuple)
            (fun xy argsOpt ->
                match argsOpt with
                    | Some args -> Node.Expr (xy, args)
                    | None -> xy)
    

    Output is now:

    Parsing correct:
    Success: Expr (X, [Y; Y; Expr (Y, [X; Y; Expr (Y, [X; Y])])])
    
    Parsing error:
    Failure:
    Error in Ln: 1 Col: 21
    x(y, y, y(x, y, y(x,z)) )
                        ^
    Expecting: 'x' or 'y'
    

    There might be simpler solutions as well, but I think this does what you want.