rascallexicon

Append text file to lexicon in Rascal


Is it possible to append terminals retrieved from a text file to a lexicon in Rascal? This would happen at run time, and I see no obvious way to achieve this. I would rather keep the data separate from the Rascal project. For example, if I had read in a list of countries from a text file, how would I add these to a lexicon (using the lexical keyword)?


Solution

  • In the data-dependent version of the Rascal parser this is even easier and faster but we haven't released this yet. For now I'd write a generic rule with a post-parse filter, like so:

    rascal>set[str] lexicon = {"aap", "noot", "mies"};
    set[str]: {"noot","mies","aap"}
    rascal>lexical Word = [a-z]+;
    ok
    rascal>syntax LexiconWord = word: Word w;
    ok
    rascal>LexiconWord word(Word w) { // called when the LexiconWord.word rule is use to build a tree
    >>>>>>> if ("<w>" notin lexicon) 
    >>>>>>>   filter;  // remove this parse tree
    >>>>>>> else fail; // just build the tree
    >>>>>>>}
    rascal>[Sentence] "hello"
    |prompt:///|(0,18,<1,0>,<1,18>): ParseError(|prompt:///|(0,18,<1,0>,<1,18>))
            at $root$(|prompt:///|(0,64,<1,0>,<1,64>))
    rascal>[Sentence] "aap"
    Sentence: (Sentence) `aap`
    rascal>
    

    Because the filter function removed all possible derivations for hello, the parser eventually returns a parse error on hello. It does not do so for aap which is in the lexicon, so hurray. Of course you can make interestingly complex derivations with this kind of filtering. People sometimes write ambiguous grammars and use filters like so to make it unambiguous.

    Parsing and filtering in this way is in cubic worst-case time in terms of the length of the input, if the filtering function is in amortized constant time. If the grammar is linear, then of course the entire process is also linear.