I am writing a DSL for fun. I decided to use attoparsec because I was familiar with it.
I want to implement parsing of includes with relative filenames like this:
include /some/dir/file.ext
or URLs:
include http://blah.com/my/file.ext
So when I'm parsing I expect to read the referenced resource and parse the entire thing, appending its contents to the "outer" parsing state.
The problem is that although the parsing of these statements is easy, I can't run IO (as I understand it) within my Attoparsec parsers.
How do I use Attoparsec to achieve this? Do I chop the initial input up using some string filtering and then parse each "block" into parse
and feed
accordingly? Essentially a two-pass parse approach?
Attoparsec is pure (Data.Attoparsec.Internal.Types.Parser
is not a transformer and doesn’t include IO
) so you’re right that you can’t expand includes from within a parser directly.
Splitting the parser into two passes seems like the right approach: one pass acts like the C preprocessor, accepting a file with include
statements interleaved with other stuff. The “other stuff” only needs to be basically lexically valid, not your full parser—just like the C preprocessor only cares about tokens and matching parentheses, not matching other brackets or anything semantic. You then replace the includes, producing a fully expanded file that you can give to your existing parser.
If an included file must be syntactically “standalone” in some sense†, then you can parse a whole file first, interleaved with include
s, then replace them. For instance:
-- Whatever items you’re parsing.
data Item
-- A reference to an included path.
data Include = Include FilePath
parse :: Parser [Either Include Item]
-- Substitute includes; also calls ‘parse’
-- recursively until no includes remain.
substituteIncludes :: [Either Include Item] -> IO [Item]
† Say, if you’re just using attoparsec for lexing tokens that can’t cross file boundaries anyway, or you’re doing full parsing but want to disallow an include file that contains e.g. unmatched brackets.
The other option is to embed IO
in your parser directly by using a different parsing library such as megaparsec, which provides a ParsecT
transformer that you can wrap around IO
to do IO
directly in your parser. I would probably do this for a prototype, but it seems tidier to separate the concerns of parsing and expansion as much as possible.