So I'm learning how to create a parser in JavaCC. This will be the type of language we would be expected to parse.
bus(59) ->
"Beach Shuttle"
at 9:30 10:30 11:30 12:00
13:00 14:00 15:00
via stops 3 76 44 89 161 32
free
bus(1234) ->
"The Hills Loop"
at 7:15 7:30 7:45 8:05 8:20 8:40 9:00
via stops 99 97 77 66 145 168
bus(7) -> "City Transit"
at
16:08 16:39 16:55 17:01 17:12 17:28
via
stops
2 1 5 7 13 119
Below is my implementation so far, ill try and explain my thought process.
PARSER_BEGIN(MyParser)
import java.io.*;
public class MyParser
{
public static void parse(String fileName) throws IOException, ParseException
{
MyParser parser = new MyParser(new FileInputStream(fileName));
parser.dsl();
}
}
PARSER_END(MyParser);
//Remainder of the .jj file.
//Tokens to ignore in the BNF follow.
SKIP : { ' ' | '\t' | '\n' | '\r' }
TOKEN : {
< BUSNUMBER : "bus(["0"-"9"]) |
< BUSNAME : "(["a"-"z", "A"-"Z"])* //Match a single character which can be lowercase or upper. Happens 0 or more times.
< VIA : "via" > |
< STOPS : "stops" > |
< FREE : "free" >
}
// was used as a temporary comments indicator.
So i've created my characters to skip over. And all the tokens I can think of.
But I'm not sure what i'm missing. Any help would be appreciated, or an explanation would be better as I actually want to learn how to do this.
Thank you.
A few comments. For
< BUSNUMBER : "bus(["0"-"9"]) |
You perhaps mean
< BUSNUMBER : "bus(" (["0"-"9"])+ ")" > |
However, if you want to allow spaces, you should treat bus
, (
, )
, and numbers as separate tokens.
For
< BUSNAME : "(["a"-"z", "A"-"Z"])* //Match a single character which can be lowercase or upper. Happens 0 or more times.
you might want
< BUSNAME : "\"" (["a"-"z", "A"-"Z", " "])* "\""> |
(I don't know what characters are possible in a bus name, but in your example you have spaces as well as letters.)
You are missing ->
, stop numbers and times.