lexical-analysisocamllex

Order of precedence in lexer


I am writing a lexer and a parser for Excel formulas.

In Excel, we could assign a cell a name. For example, abc is a valid name, whereas, it is forbidden to name a cell B2 to avoid the confusion with the cell B2. So once we meet a formula =B2, we are sure that B2 refers a cell rather than a user defined name.

In my lexer_formula.mll, I have defined identifiers:

let lex_cell = ['A' - 'Z']+ ['0' - '9']+ (* regular expressions to include all the cells *)
let lex_name = ['A' - 'Z' '0' - '9']+ (* regular expressions to include all the names *)

But a string like B2 with match both lex_cell and lex_name, does anyone know how I could tell the lexer to consider first lex_cell, then lex_name? Will it be sufficient to put lex_cell before lex_name in rule token = parse?


Solution

  • According to the ocamllex manual, it's sufficient to put lex_cell first:

    If several regular expressions match a prefix of the input, the “longest match” rule applies: the regular expression that matches the longest prefix of the input is selected. In case of tie, the regular expression that occurs earlier in the rule is selected.