parsingf#fparsec

How to parse seq of words separated by double spaces using fparsec?


Given the input:

alpha beta gamma  one two three

How could I parse this into the below?

[["alpha"; "beta"; "gamma"]; ["one"; "two"; "three"]]

I can write this when there is a better separator (e.g.__), as then

sepBy (sepBy word (pchar ' ')) (pstring "__")

works, but in the case of double space, the pchar in the first sepBy consumes the first space and then the parser fails.


Solution

  • The FParsec manual says that in sepBy p sep, if sep succeds and the subsequent p fails (without changing the state), the entire sepBy fails, too. Hence, your goal is:

    1. to make the separator fail if it encounters more than a single space char;
    2. to backtrack so that the "inner" sepBy loop closed happily and passed control to the "outer" sepBy loop.

    Here's how to do the both:

    // this is your word parser; it can be different of course,
    // I just made it as simple as possible;
    let pWord = many1Satisfy isAsciiLetter
    
    // this is the Inner separator to separate individual words
    let pSepInner =
        pchar ' '
        .>> notFollowedBy (pchar ' ') // guard rule to prevent 2nd space
        |> attempt                    // a wrapper that fails NON-fatally
    
    // this is the Outer separator
    let pSepOuter =
        pchar ' '
        |> many1                      // loop over 1+ spaces
    
    // this is the parser that would return String list list
    let pMain =
        pWord
        |> sepBy <| pSepInner         // the Inner loop
        |> sepBy <| pSepOuter         // the Outer loop
    

    Use:

    run pMain "alpha beta gamma  one two three"
    Success: [["alpha"; "beta"; "gamma"]; ["one"; "two"; "three"]]