I want to write a small editor for a specific language. In the editor, we will be able to indent
one or several lines (ie, adding white-spaces on the left hand of each line); we will also be able to format
the whole code (ie, alter white-spaces and newlines in appropriate locations).
Given a program, my front-end by ocamllex
and ocamlyacc
could build a Abstract Syntax Tree (AST)
. I would like to know what are common ways to store positions of elements in AST.
One way I guess is to append a (start) position
to each element of the AST. For example, if the type of an expression is defined as follow:
type expression =
...
| E_int of int
| E_function_EEs of Function.t * (expression list)
It will become:
type expression =
...
| E_int of position * int
| E_function_EEs of position * Function.t * (expression list)
Then, if we know the length of each element, we can infer the position of everything in the editor. Is it a common way to do so? I don't find it nice...
You don't have to repeat position
for each pattern. You could just write one in the end:
type expression =
...
| E_int of int
| E_function_EEs of Function.t * (expression list)
| E_loc of position * expression
As a result, for existing functions over expression
, you could just add a case for E_loc
, and do not need to touch existing cases.
To construct E_loc
automatically while parsing, you could add in .mly
for example:
loc(EXPRESSION):
| t = EXPRESSION { E_loc (($startpos, $endpos), t) }
(* immediate construction: *)
expression:
| INT { E_loc (($startpos, $endpos), E_int $1) }
(* or delay construction: *)
expression:
| e0 = loc(expression) PLUS e1 = loc(expression) { E_function_EEs (Function.PLUS, [e0; e1]) }