I have these tokens defined in my lex
file:
(?xi:
ADC|AND|ASL|BIT|BRK|CLC|CLD|CLI|CLV|CMP|CPX|
DEY|EOR|INC|INX|INY|JMP|JSR|LDA|LDX|LDY|LSR|
NOP|ORA|PHA|PHP|PLA|PLP|ROL|ROR|RTI|RTS|SBC|
SEC|SED|SEI|STA|STX|STY|TAX|TAY|TSX|TXA|TXS|
TYA|CPY|DEC|DEX
) {
yylval.str = strdup(yytext);
for(char *ptr = yylval.str; *ptr = tolower(*ptr); ptr++);
return MNEMONIC;
}
[\(\)=Aa#XxYy,:\+\-\<\>] {
return *yytext;
}
\$[0-9a-fA-F]{4} {
yylval.str = strdup(yytext);
return ABSOLUTE;
}
\$[0-9a-fA-F]{2} {
yylval.str = strdup(yytext);
return ZEROPAGE;
}
and this is how I parse them in bison
:
struct addr_offset {
char *str;
int offset;
};
%union {
char *str;
int number;
struct addr_offset *ao;
}
%type<str> MNEMONIC
%type<str> ABSOLUTE
%type<ao> zp
%token ZEROPAGE
expression:
MNEMONIC { statement(0, $1, NULL, "i"); }
| MNEMONIC zp { statement(5, $1, $2, }
;
zp:
ZEROPAGE { $$->str = strdup($1); }
| '>' ABSOLUTE { $$->str = strdup($2); }
| '<' ABSOLUTE { $$->str = strdup($2); }
;
Weird thing is, if I add the last two parts to the zp
rule, the MNEMONIC
is not read correctly in the expression
rule.
If you don't set $$
in a rule, bison will by default initialize it with the value of $1
. If that is a different %type
than $$
is expecting, bad things will happen.
In the case you are describing, it will likely be the value associated with the <
or >
token. Since those tokens don't set yylval
in the lexer code, it will be whatever happens to be there from the previous token -- in this case, the string allocated with strdup
for MNEMONIC
. So when you assign to $$->str
, it will treat the string as if it is a pointer to the data structure in question, and will overwrite 4 or 8 characters in the string with the pointer to another string that is being assigned there.
So the likely result will be some heap corruption which will manifest as bad/corrupted opcodes when you go to look at them.
So with the addition of the %union
/%type
declarations, we can see what is happening -- your're allocating a string and then treating the string's memory as a struct ao
, which causes heap corruption and undefined behavior.
You need your actions that return a struct ao
to actually allocate a struct ao
:
zp:
ZEROPAGE { $$ = malloc(sizeof(struct ao); $$->str = $1; }
| '>' ABSOLUTE { $$ = malloc(sizeof(struct ao); $$->str = $2; }
| '<' ABSOLUTE { $$ = malloc(sizeof(struct ao); $$->str = $2; }
;
Note that you don't need a strdup here, as the string has already been allocated in the lexer code, and you're just transferring ownership of that string from the token to the new struct ao
you're creating.
You might want to encapsulation the creation of the ao object in a function:
struct ao *new_ao(char *addr) {
struct ao *rv = malloc(sizeof(struct ao));
rv->str = addr;
rv->offset = strtol(addr, 0, 16);
return rv;
}
then your actions just become eg, { $$ = new_ao($1); }