I am writing a lexer and a parser for Excel formulas.
In Excel, we could assign a cell a name. For example, abc
is a valid name, whereas, it is forbidden to name a cell B2
to avoid the confusion with the cell
B2
. So once we meet a formula =B2
, we are sure that B2
refers a cell rather than a user defined name.
In my lexer_formula.mll
, I have defined identifiers:
let lex_cell = ['A' - 'Z']+ ['0' - '9']+ (* regular expressions to include all the cells *)
let lex_name = ['A' - 'Z' '0' - '9']+ (* regular expressions to include all the names *)
But a string like B2
with match both lex_cell
and lex_name
, does anyone know how I could tell the lexer to consider first lex_cell
, then lex_name
? Will it be sufficient to put lex_cell
before lex_name
in rule token = parse
?
According to the ocamllex manual, it's sufficient to put lex_cell
first:
If several regular expressions match a prefix of the input, the “longest match” rule applies: the regular expression that matches the longest prefix of the input is selected. In case of tie, the regular expression that occurs earlier in the rule is selected.