vhdlflex-lexergrammar-kit

Lexing The VHDL ' (tick) Token


In VHDL it the ' character can be used to encapsulate a character token ie '.' or it can as an attribute separator (similarish to CPP's :: token) ie string'("hello").

The issue comes up when parsing an attribute name containing a character ie string'('a','b','c'). In this case a naive lexer will incorrectly tokenize the first '(' as a character, and all of the following actual character will be messed up.

There is a thread in comp.lang.vhdl google group from 2007 which asks a similar question Titled "Lexing the ' char" that has an answer by user diogratia

        case '\'':                          /* IR1045 check */

            if (    last_token == DELIM_RIGHT_PAREN ||
                    last_token == DELIM_RIGHT_BRACKET ||
                    last_token == KEYWD_ALL ||
                    last_token == IDENTIFIER_TOKEN ||
                    last_token == STR_LIT_TOKEN ||
                    last_token == CHAR_LIT_TOKEN || ! (buff_ptr<BUFSIZ-2) )
                token_flag = DELIM_APOSTROPHE;
            else if (is_graphic_char(NEXT_CHAR) &&
                    line_buff[buff_ptr+2] == '\'') { CHARACTER_LITERAL:
                buff_ptr+= 3;               /* lead,trailing \' and char */
                last_token = CHAR_LIT_TOKEN;
                token_strlen = 3;
                return (last_token);
            }
            else token_flag = DELIM_APOSTROPHE;
            break;

See Issue Report IR1045: http://www.eda-twiki.org/isac/IRs-VHDL-93/IR1045.txt

As you can see from the above code fragment, the last token can be captured and used to di"sambiguate something like:

  foo <= std_logic_vector'('a','b','c');

without a large look ahead or backtracking.

However, As far as I know, flex doesn't track the last token that was parsed.

Without having to manually keep track of the last parsed token, is there a better way to accomplish this lexing task?

I am using IntelliJ GrammarKit if that helps.


Solution

  • The idea behind IR1045 is to be able to tell whether a single quote/apostrophe is part of a character literal or not without looking ahead or backtracking when you're wrong, try:

    library ieee;
    use ieee.std_logic_1164.all;
    
    entity foo is
        port (
            a:      in      std_logic;
            b:      out     std_logic_vector (3 downto 0)
        );
    end entity;
    
    architecture behave of foo is
        begin
        b <= std_logic_vector'('0','1','1','0')     when a = '1' else
             (others =>'0')                         when a = '0' else
             (others => 'X');
    end architecture behave;
    

    How far ahead are you willing to look?

    There is however a practical example of flex disambiguation of apostrophes and character literals for VHDL.

    Nick Gasson's nvc uses flex, in which he implemented an Issue Report 1045 solution.

    See the nvc/src/lexer.l which is licensed under GPLv3.

    Search for last_token:

    #define TOKEN(t) return (last_token = (t))
    

    and

    #define TOKEN_LRM(t, lrm)                                       \
       if (standard() < lrm) {                                      \
          warn_at(&yylloc, "%s is a reserved word in VHDL-%s",      \
                  yytext, standard_text(lrm));                      \
          return parse_id(yytext);                                  \
       }                                                            \
       else                                                         \
          return (last_token = (t));
    

    An added function to check it:

    static int resolve_ir1045(void);
    
    static int last_token = -1;
    

    which is:

    %%
    
    static int resolve_ir1045(void)
    {
       // See here for discussion:
       //   http://www.eda-stds.org/isac/IRs-VHDL-93/IR1045.txt
       // The set of tokens that may precede a character literal is
       // disjoint from that which may precede a single tick token.
    
       switch (last_token) {
       case tRSQUARE:
       case tRPAREN:
       case tALL:
       case tID:
          // Cannot be a character literal
          return 0;
       default:
          return 1;
       }
    }
    

    The IR1045 location has changed since the comp.lang.vhdl post it's now

    http://www.eda-twiki.org/isac/IRs-VHDL-93/IR1045.txt

    You'll also want to search for resolve_ir1045 in lexer.l.

    static int resolve_ir1045(void);
    

    and

    {CHAR}            { if (resolve_ir1045()) {
                           yylval.s = strdup(yytext);
                           TOKEN(tID);
    

    Where we find nvc uses the function to filter detecting the first single quote of a character literal.

    This was originally an Ada issue. IR-1045 was never adopted but universally used. There are probably Ada flex lexers that also demonstrate disambiguation.

    The requirement to disambiguate is discussed in Ada User Journal volume 27 number 3 from September 2006 in an article Lexical Analysis on PDF pages 30 and 31 (Volume 27 pages 159 and 160) where we see the solution is not well known.

    The comment that character literals do not precede a single quote is inaccurate:

    entity ir1045 is
    end entity;
    
    architecture foo of ir1045 is
    begin
    THIS_PROCESS:
        process
            type twovalue is ('0', '1');  
            subtype string4 is string(1 to 4);
            attribute a: string4;
            attribute a of '1' : literal is "TRUE";
        begin
            assert THIS_PROCESS.'1''a /= "TRUE"
                report "'1''a /= ""TRUE"" is FALSE";
            report "This_PROCESS.'1''a'RIGHT = " &
                integer'image(This_PROCESS.'1''a'RIGHT);
            wait;
        end process;
    end architecture;
    

    The first use of an attribute with selected name prefix that has a suffix that is a character literal demonstrates the inaccuracy, the second report statement shows it can matter:

    ghdl -a ir1045.vhdl
    ghdl -e ir1045
    ghdl -r ir1045
    ir1045.vhdl:13:9:@0ms:(assertion error): '1''a /= "TRUE" is FALSE
    ir1045.vhdl:15:9:@0ms:(report note): This_PROCESS.'1''a'RIGHT = 4
    

    In addition to an attribute name prefix containing a selected name with a character literal suffix there's a requirement that an attribute specification 'decorate' a declared entity (of an entity_class, see IEEE Std 1076-2008 7.2 Attribute specification) in the same declarative region the entity is declared in.

    This example is syntactically and semantically valid VHDL. You could note that nvc doesn't allow decorating a named entity with the entity class literal. That's not according to 7.2.

    Enumeration literals are declared in type declarations, here type twovalue. An enumerated type that has at least one character literal as an enumeration literal is a character type (5.2.2.1).