parsingparametersterminallemonre2c

Lemon parser token value with void * type


I was trying to use void* type for my lemon parser but I got some weird problem.

Initially I used a custom token type, a struct to hold the values of token, then I switched to void* because my token value types vary.

Here is some of my parser code;

expression(A) ::= expression(B) PLUS expression(C). { *((double *)A)=  *((double *)B)  +  *((double *)C) ; }
expression(A) ::= expression(B) MINUS expression(C). { *((double *)A)= *((double *) B)  -  *((double *)C) ;  }
expression(A) ::= expression(B) MULT expression(C). { *((double *)A)=  *((double *)B)  *   *((double *)C) ; }
expression(A) ::= expression(B) DIV expression(C). {
        if( *((double *)C)  != 0)
                *((double *)A)=  *((double *)B)  /  *((double *)C) ;
        else
                printf("Math Error!");
}

expression(A) ::= number(B). { *((double *)A)=  *((double *)B) ;}
number ::= INT.
number ::= FLOAT.

And here is my lexer, which is re2c file;

while ((token = lex()) != EOL) {
        sy[size].val = tkn.val;

        parse(parser, token, &sy[size].val);
        size++;
}

sy[size].val is a double type.

But whn I run 1+2 it returns 4, when I run 1+4 it reuturns 8

My guess is that parser puts the right most value to its stack and uses it wherever it sees a token parameter.


Solution

  • Here's a simple but erroneous program:

    double* add_indirect(double* b, double* c) {
      double *a;
      *a = *b + *c;    /* Undefined behaviour! */
      return a;        /* This, too! */
    }
    

    It should be clear why that program is wrong: a has never been initialized. It's declaration says it is a pointer to a double, but it is never made to point to anything. So when an attempt is made to store a value through that pointer in line 3, random memory is modified -- whatever the uninitialised pointer happened by chance to point to. Then, that random value is returned by the function, where its use will create more havoc.

    If the programmer is lucky, they will get a segmentation fault when line 3 is executed, because the random uninitialized value of a is not a valid pointer. But it is quite possible that the value which is picked up from the stack is a valid pointer. It might, for example, be the value of b, placed on the stack in order to call the function. (Most modern compilers don't use the call stack like this, but similar things can happen.)

    Now, let's look at the actions in your program.

    expression(A) ::= expression(B) PLUS expression(C). {
        *((double *)A)=  *((double *)B)  +  *((double *)C) ;
    }
    

    Making A, B and C void* and casting them to double* makes that action harder to read, but it is recognisably the same as line 3 in the failed program above. A Lemon action is supposed to set the value of the left-hand-side non-terminal (represented by A in this case), but that code assumes that A already has a value, producing the same undefined behaviour as above. Again, a segmentation fault would have been a lucky outcome since it would probably have highlighted the program, but in the case of parser generators, unlike modern compiled code, it is highly likely that the uninitialised value of A happens to be some value already on the parser stack.

    I can't see any obvious reason why you need the semantic values of tokens in this calculator to be pointers to anything. Doing that complicates your code considerably; for example, you are forced to store every tokenised value in a vector (which could overflow if the input text is too large) in order that they all have unique addresses. It would be much, much simpler to just use a value type:

    %token-type { double }
    %default-type { double }
    
    expression(A) ::= expression(B) PLUS expression(C).  { A = B + C; }
    expression(A) ::= expression(B) MINUS expression(C). { A = B - C;  }
    expression(A) ::= expression(B) MULT expression(C).  { A = B * C; }
    expression(A) ::= expression(B) DIV expression(C).   {
            if( C != 0)
              A = B / C;
            else
              fprintf(stderr, "%s\n", "Math Error! Divide by zero.");
    }
    
    expression(A) ::= number(B). { A = B ;}
    

    Then your driver becomes simple:

    while ((token = lex()) != EOL) {
            parse(parser, token, tkn.val);
    }
    

    Apparently, you intend that values be of different types. Making the values pointers does not help you with this goal, because the implementation of a pointer in C, even a void*, is a raw memory address; it does not record any type information. It is not possible to query a pointer to establish what datatype it happens to be pointing at. (So making number either a pointer to a double or a pointer to an int loses the information about what it originally was.) If you want this functionality, your token type will need to be either a union -- if every token and non-terminal has a specific type -- or your own implementation of what is usually called a "discriminated union"; i.e. a struct which contains both a union and an enumeration value explaining which member of the union is valid. But in neither case is the value a pointer (excepting the case where the token value really is a pointer, such as a character string); the semantic value is the direct value of the token object, even if that value is a (hopefully small) struct.