haskellparsec

Why does my parsec lexer seem to be whitespace sensitive for integers only?


I have a parser that looks like this:

module Parser2 where

import Text.Parsec
import Text.Parsec.String (Parser)
import Text.Parsec.Language (emptyDef)

import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok

data Expr
  = Map [(Expr, Expr)]
  | Int Integer
  | Str String
  deriving (Eq, Ord, Show)

lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
  where
    ops = ["=>"]
    names = []
    style = emptyDef {
               Tok.commentLine = "#"
             , Tok.reservedOpNames = ops
             , Tok.reservedNames = names
             }

integer :: Parser Integer
integer = Tok.integer lexer

commaSep :: Parser a -> Parser [a]
commaSep = Tok.commaSep lexer

reserved :: String -> Parser ()
reserved = Tok.reserved lexer

int :: Parser Expr
int = Int <$> integer

maplit :: Parser Expr
maplit = do
    reserved "{"
    c <- commaSep $ do
        k <- expr
        reserved "=>"
        v <- expr
        return (k, v)
    reserved "}"
    return $ Map c

expr :: Parser Expr
expr = Ex.buildExpressionParser [] factor

stringLit :: Parser Expr
stringLit = Str <$> Tok.stringLiteral lexer

factor :: Parser Expr
factor = try stringLit
      <|> try maplit
      <|> try int

contents :: Parser a -> Parser a
contents p = do
  Tok.whiteSpace lexer
  r <- p
  eof
  return r

parseExpr :: String -> Either ParseError Expr
parseExpr = parse (contents expr) "<stdin>"

This can correctly parse "maps" with whitespace around integers but not without. For strings it does not seem to matter:

ghci> parseExpr "{\"a\"=> 1,\"b\"=> 2}"
Right (Map [(Str "a",Int 1),(Str "b",Int 2)])

ghci> parseExpr "{\"b\"=>\"c\"}"
Right (Map [(Str "b",Str "c")])

ghci> parseExpr "{\"a\"=>1}"
Left "<stdin>" (line 1, column 8):
unexpected '1'
expecting end of "=>"

I realize there is probably an obvious answer related to the makeTokenParser defaults, but it is not obvious to me why this happens and I haven't figured out how to properly debug it.

Any help is appreciated.


Solution

  • The reserved parser is intended for parsing keywords specifically, so it checks that the keyword is not used as a prefix to an identifier: in a language where let is reserved, you want to parse lettuce as an identifier, not as the keyword let followed by the identifier tuce. Which characters are allowed in identifiers is controlled by the identLetter field of GenLanguageDef, which in the case of emptyDef includes digits, so your =>1 does not parse as the reserved => followed by 1 since 1 could be part of an identifier.

    In order to parse operators like =>, you should use reservedOp instead of reserved, which implements the same logic but for operators instead of identifiers, or just symbol if you don't care about the prefix check.