I'm trying to build a C compiler from scratch. Googling took me to the https://craftinginterpreters.com/ site. There are a lot of explanations how source code parsers work in detail. I've reached "Parsing expressions" chapter and stuck in this.
They offer that algorithm of creating parse tree (recursive descend)
Expr expression() {
Expr left = equality();
...
}
Expr equality() {
Expr left = comparison();
...
}
Expr comparison() {
Expr left = term();
...
}
Expr term() {
Expr left = factor();
...
}
Expr factor() {
Expr left = unary();
...
}
Expr unary() {
if (WeFoundToken('-')) {
return Expr.Unary(prev(), unary());
}
return primary; // if there's no '-'
}
But what happens when we parse that simple expression: 1 * 2 - 3
I cannot understand how that's going to be working because top-level expression()
when called falls through all lower-precedence operators parsing functions to the lowest precedence unary()
function which will iterate through tokens, find (-
) and return it to us as a unary (-
) operation (-3
) expression
In that case, there will be no way to build a valid tree that would look like this:
-
* 3
1 2
That would be invalid without a root node:
*
1 2 (-3)
Let's step through the example expression 1 * 2 - 3
. The parser starts at the beginning.
1
in primary
. Consume 1
.factor
, expr
is set to 1
. The while
condition then matches the *
. Consume *
.
while
loop, it tries to consume a unary
next, this successfully matches primary
2
. Consume 2
.1 * 2
. No more [*/]
to consume, loop terminates. Return this expression to term
.term
enters while loop and sees -
. Consumed -
(meaning the next token is 3
, not -
)
factor
, which successfully matches 3
in primary
. Consume the 3
.1 * 2 - 3
.This results in the tree:
-
* 3
1 2
In other words, because term
has already consumed 1 * 2
as a factor
, term
will enter the while
loop, not call factor
again. This successfully recognizes the -
as an operator in term
instead of part of a unary
expression.