I tried to compile a very simple program using gcc with -masm=intel
option. But instead "Error: invalid use of register" appears.
// test.c
#include <stdio.h>
size_t offset;
int main(int argc, char **argv, char **envp)
{
offset = 20;
return 0;
}
result:
$ gcc test.c -masm=intel
/tmp/ccPnqEmz.s: Assembler messages:
/tmp/ccPnqEmz.s:19: Error: invalid use of register
But when I delete the assignment statement, it can compile normally:
// test.c
#include <stdio.h>
size_t offset;
int main(int argc, char **argv, char **envp)
{
//offset = 20;
return 0;
}
result:
$ gcc test.c -masm=intel
$
When I looked at the assembly code of both, I only found that the former had one more store instruction than the latter, implementing the offset = 20
assignment:
mov QWORD PTR offset[rip], 20
Why does the GNU assembler choke on this instruction GCC emitted?
This is because offset
is a keyword in Intel-syntax assembly.
The GNU assembler in Intel-syntax mode follows the Microsoft assembler convention that mentioning a bare symbol name in an assembly instruction produces a memory operand with that symbol’s address, and not an immediate operand. To choose an immediate-operand interpretation, one needs to put the offset
keyword before the symbol.
mov esi, var # loads the value from memory at var
mov esi, offset var # loads the address of var
NASM has the opposite convention, where one would instead write:
mov esi, [var] ; loads the value from memory at var
mov esi, var ; loads the address of var
In GNU assembler’s AT&T syntax, these would be:
movl var, %esi # loads the value from memory at var
movl $var, %esi # loads the address of var
When the GNU assembler parses the code produced by the compiler, the word offset
is interpreted as a keyword and not as a symbol name, which confuses the assembler’s parser. To disambiguate the meaning when writing assembly code by hand, you can put the symbol name in quotes:
mov QWORD PTR "offset"[rip], 20
This will be correctly parsed and interpreted by the assembler. (NASM uses $
for that purpose.)
The same problem would appear if you named a variable identically to a register, with the same fix. Intel syntax is rarely used with the GNU assembler, especially when GCC is normally used to compile code, which means it was very easy for this bug to slip through.