parsingocamllexmenhir

Menhir- get values between interval


I got this rule in parser.mly:

intervalue:
| c = CST(* True False 1 7 89 "sfr" *)
    { Ecst c }
| id = ident (* a-z [a-z]* *)
    { Eident id }
| iv = LSQ l = separated_list(TWOPoints, intervalue) RSQ /* [1..4]*/
    { Elist l }
;

I need to pass to list "l" the values ​​of [start .. end]. Example ([1..4]). I search in manual and separated_list(TWOPoints, intervalue) only get values 1 and 4. But i need all values between 1 and 4 including, something like this [1..2..3..4], but without having to do it exhaustively.


Solution

  • separated_list does not reflect your desired syntax, as far as I can see. But then, neither does using intervalue for the limits of the interval.

    separated_list is not correct because it is used for an list of any positive number of elements separated by a delimiter. In particular, separated_list(TWOPoints, intervalue) will not just match 1..4, but also 1, and 1..4..7, among other things. Those other things include nested intervalues, such as 2..[4..7], which seems unlikely to be a desired construct (although since I don't know what your language looks like, perhaps it is).

    You seem to be using separated_list in the mistaken belief that it is the only way to turn the reduction into an OCanl list. That's not true, since you have the full power of OCaml available to you; you could write that production as

    | LSQ low = CST TWOPoints high = CST RSQ     { [ low high] }
    

    Or even

    | LSQ low = CST TWOPoints high = CST RSQ     { [ low .. high] }
    

    although that won't work for all possible CST tokens (such as [1 .. "a"]). And, furthermore, it doesn't permit the use of non-constant limits, such as [1 .. limit].

    But mixing syntax with run-time semantics like that is almost certainly not what you want. How would you deal with program text like the example above ([1 .. limit]), where limit is a variable which will be assigned a value during execution of the program? (Or even many values, as the program executes in a loop.) The parser should limit itself to producing a useful representation of the program to execute, and the most likely production rule will be something like this (where Value needs to be defined according to the actually desired syntax):+

    | LSQ low = Value TWOPoints high = Value RSQ     { Einterval low high }