I have a parser that looks like this:
module Parser2 where
import Text.Parsec
import Text.Parsec.String (Parser)
import Text.Parsec.Language (emptyDef)
import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok
data Expr
= Map [(Expr, Expr)]
| Int Integer
| Str String
deriving (Eq, Ord, Show)
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
where
ops = ["=>"]
names = []
style = emptyDef {
Tok.commentLine = "#"
, Tok.reservedOpNames = ops
, Tok.reservedNames = names
}
integer :: Parser Integer
integer = Tok.integer lexer
commaSep :: Parser a -> Parser [a]
commaSep = Tok.commaSep lexer
reserved :: String -> Parser ()
reserved = Tok.reserved lexer
int :: Parser Expr
int = Int <$> integer
maplit :: Parser Expr
maplit = do
reserved "{"
c <- commaSep $ do
k <- expr
reserved "=>"
v <- expr
return (k, v)
reserved "}"
return $ Map c
expr :: Parser Expr
expr = Ex.buildExpressionParser [] factor
stringLit :: Parser Expr
stringLit = Str <$> Tok.stringLiteral lexer
factor :: Parser Expr
factor = try stringLit
<|> try maplit
<|> try int
contents :: Parser a -> Parser a
contents p = do
Tok.whiteSpace lexer
r <- p
eof
return r
parseExpr :: String -> Either ParseError Expr
parseExpr = parse (contents expr) "<stdin>"
This can correctly parse "maps" with whitespace around integers but not without. For strings it does not seem to matter:
ghci> parseExpr "{\"a\"=> 1,\"b\"=> 2}"
Right (Map [(Str "a",Int 1),(Str "b",Int 2)])
ghci> parseExpr "{\"b\"=>\"c\"}"
Right (Map [(Str "b",Str "c")])
ghci> parseExpr "{\"a\"=>1}"
Left "<stdin>" (line 1, column 8):
unexpected '1'
expecting end of "=>"
I realize there is probably an obvious answer related to the makeTokenParser defaults, but it is not obvious to me why this happens and I haven't figured out how to properly debug it.
Any help is appreciated.
The reserved parser is intended for parsing keywords specifically, so it checks that the keyword is not used as a prefix to an identifier: in a language where let is reserved, you want to parse lettuce as an identifier, not as the keyword let followed by the identifier tuce. Which characters are allowed in identifiers is controlled by the identLetter field of GenLanguageDef, which in the case of emptyDef includes digits, so your =>1 does not parse as the reserved => followed by 1 since 1 could be part of an identifier.
In order to parse operators like =>, you should use reservedOp instead of reserved, which implements the same logic but for operators instead of identifiers, or just symbol if you don't care about the prefix check.