I am using Camlp4 to parse a string of quantifiers in which quantification keywords and variables are separated by comma. One example is like below:
exists x,y,z, forall a,b, exists h,k
Here, exists
and forall
are keywords, x,y,z,a,b,h,k
are identifiers. The corresponding tokens are EXISTS
, FORALL
and IDENTIFIER of string
.
My data structure:
type quantifier =
| Exists of string
| Forall of string
To parse the string of quantifiers above, my rules are:
id: [[
`IDENTIFIER s-> s
]];
one_kind_quantifiers: [[
`EXISTS; il=LIST1 id SEP `COMMA -> List.map (fun v -> Exists v) il
|`FORALL; il=LIST1 id SEP `COMMA -> List.map (fun v -> Forall v) il
]];
quantifiers: [[
t=LIST0 one_kind_quantifiers SEP `COMMA -> List.flatten t
]];
However, my parser always throws an error:
Stream.Error("[id] expected after COMMA (in [one_kind_quantifiers])").
Do you know how to fix this problem? How to make LIST1
stop throwing error when it detects the element after `COMMA
is a keyword?
Thank you very much!
(For more information, if I use white-space to separate variables impacted by the same quantification keyword, like exists x y z, forall a b, exists h k
. And remove the SEP `COMMA
in the one_kind_quantifiers
rule, then the parser can perfectly parse this new string).
===========================
Update solution:
With suggestion from Igor (@ygrek), I am able to write the expected parser by not using LIST1 but manually writing rules to parse list of string.
id_list: [[
`IDENTIFIER s -> [s]
|t=`id_list; `COMMA; `IDENTIFIER s -> t@[s]
]];
one_kind_quantifiers: [[
`EXISTS; il=id_list -> List.map (fun v -> Exists v) il
|`FORALL; il=id_list -> List.map (fun v -> Forall v) il
]];
quantifiers: [[
t=LIST0 one_kind_quantifiers SEP `COMMA -> List.flatten t
]];
Note that the rule to parse a list of strings is:
id_list: [[
`IDENTIFIER s -> [s]
| t=`id_list; `COMMA; `IDENTIFIER s -> t@[s]
]];
but not:
id_list: [[
`IDENTIFIER s -> [s]
| `IDENTIFIER s; `COMMA; t=`id_list -> [s]@t
]];
The second way of writing the id_list
rule throws the same error as when using LIST1
. (So I guess that maybe it is the way that LIST1
is implemented...)
camlp4 is recursive-descent parser and IIRC it will backtrack only on the first token of each rule, once first token matches it will continue till the end of the rule. In this case for LIST1
it can match on comma so it descends but second token is not as expected and it is too late to backtrack. I guess unrolling LIST1
and inlining into your grammar will workaround this issue, but will probably be rather ugly.