I have a parser that looks like this:
module Parser2 where
import Text.Parsec
import Text.Parsec.String (Parser)
import Text.Parsec.Language (emptyDef)
import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok
data Expr
= Map [(Expr, Expr)]
| Int Integer
| Str String
deriving (Eq, Ord, Show)
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
where
ops = ["=>"]
names = []
style = emptyDef {
Tok.commentLine = "#"
, Tok.reservedOpNames = ops
, Tok.reservedNames = names
}
integer :: Parser Integer
integer = Tok.integer lexer
commaSep :: Parser a -> Parser [a]
commaSep = Tok.commaSep lexer
reserved :: String -> Parser ()
reserved = Tok.reserved lexer
int :: Parser Expr
int = Int <$> integer
maplit :: Parser Expr
maplit = do
reserved "{"
c <- commaSep $ do
k <- expr
reserved "=>"
v <- expr
return (k, v)
reserved "}"
return $ Map c
expr :: Parser Expr
expr = Ex.buildExpressionParser [] factor
stringLit :: Parser Expr
stringLit = Str <$> Tok.stringLiteral lexer
factor :: Parser Expr
factor = try stringLit
<|> try maplit
<|> try int
contents :: Parser a -> Parser a
contents p = do
Tok.whiteSpace lexer
r <- p
eof
return r
parseExpr :: String -> Either ParseError Expr
parseExpr = parse (contents expr) "<stdin>"
This can correctly parse "maps" with whitespace around integers but not without. For strings it does not seem to matter:
ghci> parseExpr "{\"a\"=> 1,\"b\"=> 2}"
Right (Map [(Str "a",Int 1),(Str "b",Int 2)])
ghci> parseExpr "{\"b\"=>\"c\"}"
Right (Map [(Str "b",Str "c")])
ghci> parseExpr "{\"a\"=>1}"
Left "<stdin>" (line 1, column 8):
unexpected '1'
expecting end of "=>"
I realize there is probably an obvious answer related to the makeTokenParser
defaults, but it is not obvious to me why this happens and I haven't figured out how to properly debug it.
Any help is appreciated.
The reserved
parser is intended for parsing keywords specifically, so it checks that the keyword is not used as a prefix to an identifier: in a language where let
is reserved, you want to parse lettuce
as an identifier, not as the keyword let
followed by the identifier tuce
. Which characters are allowed in identifiers is controlled by the identLetter
field of GenLanguageDef
, which in the case of emptyDef
includes digits, so your =>1
does not parse as the reserved =>
followed by 1
since 1
could be part of an identifier.
In order to parse operators like =>
, you should use reservedOp
instead of reserved
, which implements the same logic but for operators instead of identifiers, or just symbol
if you don't care about the prefix check.