As an exercise for myself I'm using FParsec to write a function that can generate a random string from a spec in (limited) Regex form.
E.g.
Input: ([Hh]ello ){1,3} world!?
Output: Hello hello world!
Input: (\d{1,2}\.){3}(\d{1,2})
Output: 38.12.29.05
I have a lot of it working, but I'm having trouble with the idea of postfix terms (i.e. that a parser might need to go back and modify the output, not just the position in the Char stream). E.g. "a" vs "a+"
Here's a cut-down version of my domain types:
type Count =
| ExactCount of int
| MinCount of int
| MaxCount of int
| RangeCount of int * int
type Term =
| CharLiteral of char
| Count of Term * Count
type RandExpSpec = Term list
So the input ab
should generate [CharLiteral 'a'; CharLiteral 'b']
but ab+
should generate [CharLiteral 'a'; Count (CharLiteral 'b', MinCount 1)]
. So that kind of implies that, on encountering a Count
term in the stream, the parser needs to backtrack through the output in order to wrap the last term in another object.
Now, I don't know how to do that. Here's my current parsing definition, that does (mostly) work but is very inefficient:
let parseCharLiteral = choice [ letter; digit ] |>> CharLiteral
let rec parseTerm =
parse.Delay(fun () -> choice [ parseCharLiteral ])
and parseCount =
parseTerm
.>>. choice [ skipChar '*' >>% (MinCount 0)
skipChar '+' >>% (MinCount 1)
skipChar '?' >>% (RangeCount(0, 1)) ]
|>> Count
let parseTerms =
many ((attempt parseCount) <|> parseTerm) .>> eof
You can see that in parseCount
I call parseTerm
first, then parse the actual count information. Then, in parseTerms
I attempt the parseCount
parser every time, backtracking through the input if it doesn't work. This is very inefficient because I'm essentially making two passes on almost every char in the input stream just in case it's followed by a count modifier.
Is there a more efficient way to do this? I feel like I should be writing something more like:
let parseCharLiteral = choice [ letter; digit ] |>> CharLiteral
let rec parseTerm =
parse.Delay(fun () -> choice [ parseCharLiteral ] .>>. (attempt parseCount))
and parseCount =
choice [ skipChar '*' >>% (MinCount 0)
skipChar '+' >>% (MinCount 1)
skipChar '?' >>% (RangeCount(0, 1)) ]
|>> Count
let parseTerms =
many parseTerm .>> eof
but I can't make that work since parseCount
qould need to wrap the previous term returned by parseTerm
.
I think you can use opt
to allow parseCount
to not find a count if there isn't one:
let parseCount =
parseTerm
.>>. opt (choice [ skipChar '*' >>% (MinCount 0)
skipChar '+' >>% (MinCount 1)
skipChar '?' >>% (RangeCount(0, 1)) ])
|>> function
| term, None -> term
| term, Some count -> Count (term, count)
let parseTerms =
many parseCount .>> eof