I would like to parse the books of a library expressed in a format like this:
#Book title 1
Chapter 1
Chapter 2
#Book title 2
Chapter 1
Chapter 2
Chapter 3
As you can see, the titles of the boot are preceded by a # and the chapters of each book are the following lines. It should be rather easy to create a parser for this.
So far, I have this code (parsers + tokenizer):
void Main()
{
var tokenizer = new TokenizerBuilder<PrjToken>()
.Match(Superpower.Parsers.Character.EqualTo('#'), PrjToken.Hash)
.Match(Span.Regex("[^\r\n#:=-]*"), PrjToken.Text)
.Match(Span.WhiteSpace, PrjToken.WhiteSpace)
.Build();
var input = @"#Book 1
Chapter 1
Chapter 2
#Book 2
Chapter 1
Chapter 2
Chapter 3";
var library = MyParsers.Library.Parse(tokenizer.Tokenize(input));
}
public enum PrjToken
{
WhiteSpace,
Hash,
Text
}
public class Book
{
public string Title { get; }
public string[] Chapters { get; }
public Book(string title, string[] chapters)
{
Title = title;
Chapters = chapters;
}
}
public class Library
{
public Book[] Books { get; }
public Library(Book[] books)
{
Books = books;
}
}
public class MyParsers
{
public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
select text.ToStringValue();
public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
select text;
public static readonly TokenListParser<PrjToken, string> Title =
from hash in Token.EqualTo(PrjToken.Hash)
from text in Text
from wh in Whitespace
select text;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Text.ManyDelimitedBy(Whitespace)
select new Book(title, chapters);
public static readonly TokenListParser<PrjToken, Library> Library =
from books in Book.ManyDelimitedBy(Whitespace)
select new Library(books);
}
The above code is ready to run in .NET Fiddle on this link https://dotnetfiddle.net/3P5dAJ
Everything looks fine. However, something is wrong with the parser because I'm getting this error:
Syntax error (line 4, column 1): unexpected hash
#
, expected text.
What's wrong with my parsers?
You can solve this by parsing the chapters as a separate list, where each chapter ends with the whitespace character:
public static readonly TokenListParser<PrjToken, string> Chapter =
from chapterName in Text
from wh in Whitespace
select chapterName;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Chapter.Many()
select new Book(title, chapters);
In essence I think that when Text.ManyDelimitedBy(Whitespace)
encounters the trailing whitespace (newline) at the end of Chapter 2
it will expect another instance of Chapter Name, not the start of a new book.
The parser cannot distinguish between the delimiter between Chapters
and the delimiter between Books
(both whitespace (newline)), and it will therefore expect another chapter, not the start of a new Book
.
By breaking up the parser of a Chapter into Text
followed by a Whitespace
token you have broken this ambiguity.
Since you now have swallowed the Whitespace
at the end of the chapter, each book is not delimited by a Whitespace
, and you have to change how the Book
parser works as well:
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Chapter.Many()
select new Book(title, chapters);
In addition to this, if you want the file to be parsed without a newline at the end of the file, you also have to make the Whitespace
at the end of the Chapter
be optional:
public static readonly TokenListParser<PrjToken, string> Chapter =
from chapterName in Text
from wh in Whitespace.Optional()
select chapterName;
In the end we end up with (Complete parser):
public class MyParsers
{
public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
select text.ToStringValue();
public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
select text;
public static readonly TokenListParser<PrjToken, string> Title =
from hash in Token.EqualTo(PrjToken.Hash)
from text in Text
from wh in Whitespace
select text;
public static readonly TokenListParser<PrjToken, string> Chapter =
from chapterName in Text
from wh in Whitespace.Optional()
select chapterName;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Chapter.Many()
select new Book(title, chapters);
public static readonly TokenListParser<PrjToken, Library> Library =
from books in Book.Many()
select new Library(books);
}