haskellattoparsec

Attoparsec: matching any of strings with common prefix


I am trying to parse a limited set of valid strings which have a common prefix with attoparsec. However, My attempts result in either a Partial result or a premature Done:

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative
import qualified Data.Attoparsec.Text as PT

data Thing = Foobar | Foobaz | Foobarz

thingParser1 = PT.string "foobarz" *> return Foobarz
           <|> PT.string "foobaz" *> return Foobaz
           <|> PT.string "foobar" *> return Foobar

thingParser2 = PT.string "foobar" *> return Foobar
           <|> PT.string "foobaz" *> return Foobaz
           <|> PT.string "foobarz" *> return Foobarz

What I want is for "foobar" to result in Foobar, "foobarz" to result in Foobarz and "foobaz" to result in Foobaz. However

PT.parse thingParser1 "foobar"

results in a PT.Partial and

PT.parse thingParser2 "foobarz"

results in a PT.Done "z" Foobar.


Solution

  • As you see the order of alternatives matters in the parsec family of parser combinator libraries. It will first try the parser on the left and only continue with the parser on the right if that fails.

    Another thing to notice is that your parsers don't require that the input ends after parsing. You can force that by using parseOnly instead of parse to run the actual parser. Or you can use the maybeResult or eitherResult functions to convert the Result into a Maybe or Either respectively.

    That solution will work for thingParser1, but thingParser2 will still not work. This is because you need to have both the string parser and an endOfInput under a single try, this would work:

    thingParser3 = Foobar  <$ PT.string "foobar"  <* endOfInput
               <|> Foobaz  <$ PT.string "foobaz"  <* endOfInput
               <|> Foobarz <$ PT.string "foobarz" <* endOfInput
    

    A slightly better approach is to do a quick look ahead to see if an z follows the foobar, you can do that like this:

    thingParser4 = Foobar  <$ (do
                     PT.string "foobar"
                     c <- peekChar
                     guard (maybe True (/= 'z') c))
               <|> Foobaz  <$ PT.string "foobaz"
               <|> Foobarz <$ PT.string "foobarz"
    

    But this backtracking also degrades the performance, so I would stick with the thingParser1 implementation.