javaregexparsingdsljavacc

What am I missing in my parser for this text in JavaCC?


So I'm learning how to create a parser in JavaCC. This will be the type of language we would be expected to parse.

bus(59) ->
    "Beach Shuttle"
    at 9:30 10:30 11:30 12:00
       13:00 14:00 15:00
    via stops 3 76 44 89 161 32
    free

bus(1234) ->
    "The Hills Loop"
    at 7:15 7:30 7:45 8:05 8:20 8:40 9:00
    via stops 99 97 77 66 145 168

bus(7) -> "City Transit"
at
    16:08 16:39 16:55 17:01 17:12 17:28
via
stops
    2 1 5 7 13 119
  1. We have some rules that the parser needs to follow as well.
  2. We must ignore whitespace except for those inside "".
  3. We can have any number of bus declarations and the order within will always be the same.
  4. The bus name (in double quotes) will contain any number of characters.
  5. The times are in 24hopur format hh::mm and there must be one per bus declaration.
  6. The stop numbers are all pre-defined locations and there must be at least 2 per bus declaration.
  7. The word free may or may not be present for each bus declaration.

Below is my implementation so far, ill try and explain my thought process.

PARSER_BEGIN(MyParser)
import java.io.*;
public class MyParser
{
    public static void parse(String fileName) throws IOException, ParseException
    {
        MyParser parser = new MyParser(new FileInputStream(fileName));
        parser.dsl();
    }
}

PARSER_END(MyParser);

//Remainder of the .jj file.
//Tokens to ignore in the BNF follow.

SKIP : { ' ' | '\t' | '\n' | '\r' }

TOKEN : {
    < BUSNUMBER : "bus(["0"-"9"]) |
    < BUSNAME : "(["a"-"z", "A"-"Z"])*  //Match a single character which can be lowercase or upper. Happens 0 or more times.
    < VIA : "via" > |
    < STOPS : "stops" > |
    < FREE : "free" > 
    
    }

// was used as a temporary comments indicator.

So i've created my characters to skip over. And all the tokens I can think of.

But I'm not sure what i'm missing. Any help would be appreciated, or an explanation would be better as I actually want to learn how to do this.

Thank you.


Solution

  • A few comments. For

        < BUSNUMBER : "bus(["0"-"9"]) |
    

    You perhaps mean

        < BUSNUMBER : "bus(" (["0"-"9"])+ ")" > |
    

    However, if you want to allow spaces, you should treat bus, (, ), and numbers as separate tokens.

    For

        < BUSNAME : "(["a"-"z", "A"-"Z"])*  //Match a single character which can be lowercase or upper. Happens 0 or more times.
    

    you might want

       < BUSNAME : "\"" (["a"-"z", "A"-"Z", " "])* "\""> | 
    

    (I don't know what characters are possible in a bus name, but in your example you have spaces as well as letters.)

    You are missing ->, stop numbers and times.