I'm writing a parser for a specific file format using FParsec as a firstish foaray into learning fsharp. Part of the file has the following format
{ 123 456 789 333 }
Where the numbers in the brackets are pairs of values and there can be an arbitrary number of spaces to separate them. So these would also be valid things to parse:
{ 22 456 7 333 }
And of course the content of the brackets might be empty, i.e. {}
In addition I want the parser to be able to handle the case where the content is a bit malformed, eg. { some descriptive text }
or maybe more likely { 12 3 4}
(invalid since the 4
wouldn't be paired with anything). In this case I just want the contents saved to be processed separately.
I have this so far:
type DimNummer = int
type ObjektNummer = int
type DimObjektPair = DimNummer * ObjektNummer
type ObjektListResult = Result<DimObjektPair list, string>
let sieObjektLista =
let pnum = numberLiteral NumberLiteralOptions.None "dimOrObj"
let ws = spaces
let pobj = pnum .>> ws |>> fun x ->
let on: ObjektNummer = int x.String
on
let pdim = pnum |>> fun x ->
let dim: DimNummer = int x.String
dim
let pdimObj = (pdim .>> spaces1) .>>. pobj |>> DimObjektPair
let toObjektLista(objList:list<DimObjektPair>) =
let res: ObjektListResult = Result.Ok objList
res
let pdimObjs = sepBy pdimObj spaces1
let validList = pdimObjs |>> toObjektLista
let toInvalid(str:string) =
let res: ObjektListResult =
match str.Trim(' ') with
| "" -> Result.Ok []
| _ -> Result.Error str
res
let invalidList = manyChars anyChar |>> toInvalid
let pres = between (pchar '{') (pchar '}') (ws >>. (validList <|> invalidList) .>> ws)
pres
let parseSieObjektLista = run sieObjektLista
However running this on a valid sample I get an error:
{ 53735 7785 86231 36732 }
^
Expecting: whitespace or '}'
You're trying to consume too many spaces.
Look: pdimObj
is a pdim
, followed by some spaces, followed by pobj
, which is itself a pnum
followed by some spaces. So if you look at the first part of the input:
{ 53735 7785 86231 36732 }
\___/\______/\__/\/
^ ^ ^ ^
| | | |
pnum | | |
^ spaces1 | |
| | ws
pdim pnum ^
^ ^ |
| \ /
| \ /
| \/
\ pobj
\ /
\________/
^
|
pdimObj
One can clearly see from here that pdimObj
consumes everything up to 86231
, including the space just before it. And therefore, when sepBy
inside pdimObjs
looks for the next separator (which is spaces1
), it can't find any. So it fails.
The smallest way to fix this is to make pdimObjs
use many
instead of sepBy
: since pobj
already consumes trailing spaces, there is no need to also consume them in sepBy
:
let pdimObjs = many pdimObj
But a cleaner way, in my opinion, would be to remove ws
from pobj
, because, intuitively, trailing spaces aren't part of the number representing your object (whatever that is), and instead handle possible trailing spaces in pdimObjs
via sepEndBy
:
let pobj = pnum |>> fun x ->
let on: ObjektNummer = int x.String
on
...
let pdimObjs = sepEndBy pdimObj spaces1