I'm trying to write a parser for the Tiger language from modern compiler implementation in ML, and am stuck on one of the recursive types.
I have the following type
data LValue =
Id Atom
| RecordAccess LValue Atom
| ArraySubscript LValue Expression
with the following grammar:
lvalue -> id
-> lvalue.id
-> lvalue[exp]
id -> atom
exp -> (the big, overarching, everything-is-an-expression type)
And I'm trying to parse it with Parsec, but I'm getting stuck in an infinitely recursive loop. Here's my current base parser:
lvalueParser :: Parsec String () LValue
lvalueParser =
try (Id <$> (atomParser <* (notFollowedBy (char '.'))))
<|> try recordAccessParser
where recordAccessParser = (uncurry RecordAccess) <$> do {
record <- lvalueParser;
char '.';
atom <- atomParser;
return (record, atom)
}
(Note: I haven't yet attempted to implement anything to handle the ArrayAccess
portion)
Clearly, the infinite-loopiness happens when the recordAccessParser
calls back to the lvalueParser
.
I can change the recordAccessParser
thusly:
recordAccessParser = (uncurry RecordAccess) <$> do {
record <- atomParser;
char '.';
atom <- atomParser;
return (Id record, atom)
}
and it terminates. However, it will not parse record access more than a single level deep:
Parsec.parse lvalueParser "" "record_1.field_1.field_2"
#=> RecordAccess (Id record_1) (Id field_1)
And I expect
#=> RecordAccess (RecordAccess (Id record_1) (Id field_1)) (Id field_2)
I looked at chainl1
, but the type of the chaining parser is a -> a -> a
, and that doesn't match the type of LValue
that reflects the grammar. I also looked at many
; however I don't have a constant prefix for each term - the left recursion term is what I'm trying to parse into part of the result type.
I imagine I'm missing a specific concept of Parsec/parsing and would love to be pointed in the right direction. There are more types in the language that I'm writing a parser for that will have similar constructions.
Though you cannnot use chainl1
here,
You can define chainl1
-like combinator like this:
leftRec :: (Stream s m t)
=> ParsecT s u m a -> ParsecT s u m (a -> a) -> ParsecT s u m a
leftRec p op = rest =<< p
where
rest x = do f <- op
rest (f x)
<|> return x
I consulted here to implement this.
By using this combinator, lvalueParser
can be defined as follows:
lvalueParser :: Parser LValue
lvalueParser = leftRec idParser (recordAccessModifier <|> arraySubscriptModifier)
where
idParser = Id <$> atomParser
recordAccessModifier = do
a <- char '.' *> atomParser
return (\l -> RecordAccess l a)
arraySubscriptModifier = do
e <- between (char '[') (char ']') expParser
return (\l -> ArraySubscript l e)
Example:
main = parseTest lvalueParser "x.y[2].z"
-- => RecordAccess (ArraySubscript (RecordAccess (Id 'x') 'y') (ENat 2)) 'z'
where Atom
, Expression
, and their parsers are defined as follows:
type Atom = Char
atomParser :: Parser Atom
atomParser = letter <?> "atom"
data Expression = ENat Int
deriving Show
expParser :: Parser Expression
expParser = (ENat . read) <$> many digit