I am writing a simple generalised parser combinator library. This means the library contains many small function objects, called parsers, which (when called) take a string as input and return a list of ParseResults as output, where a ParseResult is
template <typename A>
using ParseResult = std::pair<A, std::string>
The list is empty if the parser did not match, contains a single result if it did match, and certain parsers that might match in multiple (ambiguous) ways might return more results.
However, this means that right now, a whole lot of string copying is going on. Also, at the start, the finally constructed parser needs to be called with a string, so all of std::cin
(or the coompete contents of a file) are copied to a string.
What seems like a better idea(since the parsers only ever look at the first (few) character(s) at the current front of the string) , is to keep track of where you are right now in the standard input stream. And I believe this is exactly what a std::istream
is. However, istreams are not copyable. How can my problem be solved?
Is there a way to return a copy of an istream that points to a few characters past where there original points to? Or is there another, cleaner way to solve this problem?
The question can be rephrased thus: How do you represent unparsed part of the input in a way that avoids excessive copying and allows input streaming?
The most flexible way is to represent it with an iterator. If you parsers do backtracking, it would need to be a ForwardIterator
, if not, InputIterator
is sufficient. This means you can then use std::istream_iterator
over std::cin
or std::ifstream
s directly, or parse from in-memory std::strings
or char
arrays. Streaming with backtracking is a bit more involved and would require you to write a buffering iterator adaptor that converts InputIterator
like std::istream_iterator
into a ForwardIterator
or write an iterator directly wrapping std::ifstream
and doing .seekg()
when you need to backtrack.
Another option is to use C++17's std::string_view
which does not copy and has a nice, parsing-friendly interface. This does not solve streaming though, you'd still have to read entire file first.