I'm trying to parse the following input using megaparsec (if this sounds familiar to the advent of code challenge from day 7 2022, it's because it is):
$ cd /
$ ls
dir a
123 foo.txt
dir d
$ cd a
$ ls
dir e
4567 f
78901 h.zip
$ cd e
I want to parse it into the following data structure:
data DirContent = File (Integer, String) | Directory String deriving stock (Show)
data Input
= Cd DirectoryName
| CdRoot -- cd /
| CdPop -- cd ..
| Ls [DirContent]
deriving stock (Show)
I came up with the following parser:
import Text.Megaparsec
import Text.Megaparsec.Char
parseFull :: Parsec Void Text [Input]
parseFull = inputParser `sepBy` newline
inputParser :: Parsec Void Text Input
inputParser =
CdRoot <$ string "$ cd /"
<|> CdPop <$ string "$ cd .."
<|> Cd <$> (string "$ cd " *> many alphaNumChar)
<|> Ls <$> ((string "$ ls" <* newline) *> dirContentParser `sepBy` newline)
dirContentParser :: Parsec Void Text DirContent
dirContentParser =
Directory <$> (string "dir " *> (pure <$> alphaNumChar))
<|> curry File <$> decimal <* char ' ' <*> many (alphaNumChar <|> char '.')
The idea is that all the lines after a $ ls
are the results of this command and should be contained in the parsed value. Therefore, the parser has to consume all lines after a $ ls
until the next line starting with $
(but without consuming it, because otherwise the following parser will fail).
The parser partly works, but only if there is at most one ls
command in it and only if it's the last command in the input.
When I run the parser with debug mode, I get the following error:
input> IN: "$ cd /<newline>$ ls<newline>dir d<newli <…>
input> MATCH (CERR): "$ cd /<newline>$ ls<newline>dir d<newli <…>
input> ERROR:
input> offset=33:
input> unexpected "$ cd"
input> expecting "dir " or integer
as far as I can tell, the ls
parser doesn't end at the next command. What I would need is a parser that stops when the next $
appears, but doesn't consume it and leave it to the next parser. I tried using try
, but it didn't change anything.
manyTill_
sounds great, but it consumes the token at the end.
Based on the suggestion in the comments, I came up with the following solution:
parseFull :: Parsec Void Text [Input]
parseFull = many inputParser
inputParser :: Parsec Void Text Input
inputParser =
CdRoot <$ string "$ cd /" <* splitter
<|> CdPop <$ string "$ cd .." <* splitter
<|> Cd <$> (string "$ cd " *> many alphaNumChar <* splitter)
<|> Ls <$> (string "$ ls" >> newline *> many (dirContentParser <* splitter))
where
splitter = optional newline
dirContentParser :: Parsec Void Text DirContent
dirContentParser =
Directory <$> (string "dir " *> (pure <$> alphaNumChar))
<|> curry File <$> decimal <* char ' ' <*> many (alphaNumChar <|> char '.')
It doesn't use sepBy
, but instead parses an optional newline at the end of each line (newline
optional because otherwise the parser fails if there's no newline at the end of the input)