cgccx86gnu-assemblerintel-syntax

A variable named "offset" causes "Error: invalid use of register" to appear when using "-masm=intel" in gcc, but no error in AT&T mode


I tried to compile a very simple program using gcc with -masm=intel option. But instead "Error: invalid use of register" appears.

// test.c
#include <stdio.h>
size_t offset;

int main(int argc, char **argv, char **envp)
{
        offset = 20;
        return 0;
}

result:

$ gcc test.c -masm=intel
/tmp/ccPnqEmz.s: Assembler messages:
/tmp/ccPnqEmz.s:19: Error: invalid use of register

But when I delete the assignment statement, it can compile normally:

// test.c
#include <stdio.h>
size_t offset;

int main(int argc, char **argv, char **envp)
{
        //offset = 20;
        return 0;
}

result:

$ gcc test.c -masm=intel
$

When I looked at the assembly code of both, I only found that the former had one more store instruction than the latter, implementing the offset = 20 assignment:

        mov     QWORD PTR offset[rip], 20

Why does the GNU assembler choke on this instruction GCC emitted?


Solution

  • This is because offset is a keyword in Intel-syntax assembly.

    The GNU assembler in Intel-syntax mode follows the Microsoft assembler convention that mentioning a bare symbol name in an assembly instruction produces a memory operand with that symbol’s address, and not an immediate operand. To choose an immediate-operand interpretation, one needs to put the offset keyword before the symbol.

            mov     esi, var         # loads the value from memory at var
            mov     esi, offset var  # loads the address of var
    

    NASM has the opposite convention, where one would instead write:

            mov     esi, [var]       ; loads the value from memory at var
            mov     esi, var         ; loads the address of var
    

    In GNU assembler’s AT&T syntax, these would be:

            movl    var, %esi        # loads the value from memory at var
            movl    $var, %esi       # loads the address of var
    

    When the GNU assembler parses the code produced by the compiler, the word offset is interpreted as a keyword and not as a symbol name, which confuses the assembler’s parser. To disambiguate the meaning when writing assembly code by hand, you can put the symbol name in quotes:

            mov     QWORD PTR "offset"[rip], 20
    

    This will be correctly parsed and interpreted by the assembler. (NASM uses $ for that purpose.)

    The same problem would appear if you named a variable identically to a register, with the same fix. Intel syntax is rarely used with the GNU assembler, especially when GCC is normally used to compile code, which means it was very easy for this bug to slip through.