assemblyx86-16gnu-assemblerattintel-syntax

Intel Assembly ljmp syntax from AT&T syntax


I am trying to convert the xv6 boot code from At&t syntax to Intel syntax and I have a problem with the ljmp instruction. I am trying to learn the boot process of Intel computers and I am not particularly strong with Intel assembly.

The original AT&T syntax is ljmp $0x8, $start32.

Minimal example:

.code16
   jmp 0x8:start32          # won't assemble

.code32
start32:
   nop

Using as -32 -msyntax=intel -mnaked-reg foo.s with GNU Binutils 2.35.1 produces
Error: junk ':start32' after expression for the far jmp line.

I am using GNU as, and gcc tools.
There might also be other problems with the assembly such as the gdtdesc and gdt.

The full code ported to Intel syntax is:

# Start the first CPU: switch to 32-bit protectied mode, jump into C.
# The BIOS loads this code from the first sector of the hard disk into
# memory at physical address 0x7c00 and starts executing in real mode
# with cs = 0 and ip = 7c00.
.code16
.global start
start:
    # Disable interrupts.
    cli

    # Zero data segment registers DS, ES, and SS.
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax

seta20.1:
    # Wait for not busy.
    in al, 0x64
    test al, 0x2
    jnz seta20.1

    # 0xd1 -> port 0x64
    mov al, 0xd1
    out 0x64, al

seta20.2:
    # Wait for not busy.
    in al, 0x64
    test al, 0x2
    jnz seta20.2

    # 0xdf -> port 0x60
    mov al, 0xdf
    out 0x60, al

    # Switch from real to protected mode. Use a bootstrap GDT that makes
    # virtual addresses map directly to physical addressses so that the
    # effective memory map doesn't change during the transition.
    lgdt gdtdesc

    # Protection Enable in cr0 register.
    mov eax, cr0
    or eax, 0x1
    mov cr0, eax

    # Complete the transtion to 32-bit protected mode by using a long jmp
    # to reload cs and eip. The segment descriptors are set up with no
    # translation, so that the mapping is still the identity mapping.

    # This instruction giving me problems.
    ljmp start32, 0x8

.code32
start32:
    # Set up the protected-mode data segment registers
    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov ss, ax

    # Zero the segments not ready for use.
    xor ax, ax
    mov fs, ax
    mov gs, ax

    # Set up the stack pointer and call into C.
    mov esp, start
    call bootmain

    # If bootmain returns spin.. ??
spin:
    hlt
    jmp spin

# Bootstrap GDT set up null segment, code segment, and data segment respectively.
# Force 4 byte alignment.
.p2align 2
gdt:
    .word 0x0000, 0x0000
    .byte 0, 0, 0, 0
    .word 0xffff, 0x0000
    .byte 0, 0x9a, 0xcf, 0
    .word 0xffff, 0x0000
    .byte 0, 0x92, 0xcf, 0

# sizeof(gdt) - 1 and address of gdt respectively.
gdtdesc:
    .word (gdtdesc - gdt - 1)
    .long gdt

Solution

  • You can use jmp 0x08, start32

    For some reason, jmp 0x8:start32 only works after .intel_syntax noprefix, even with command line args that should be equivalent. This is the syntax used by Binutils objdump -d -Mintel -mi8086, e.g. ea 16 00 08 00 jmp 0x8:0x16 so it's probably a GAS bug that it's not accepted sometimes.


    I edited your question to create a small reproducible example with as 2.35.1 (which I have on Arch GNU/Linux) based on your comments replying to Jester. I included command line options: I assume you must have been using those because there's no .intel_syntax noprefix directive in your file.

    That seems to be the problem: -msyntax=intel -mnaked-reg makes other Intel syntax things work, like xor ax,ax, but does not make jmp 0x8:start32 work (or other ways of writing it). Only a .intel_syntax noprefix1 directive makes that syntax for far jmp work.

    # .intel_syntax noprefix        # rely on command line options to set this
    .code16
       xor  ax, ax              # verify that command-line setting of intel_syntax worked, otherwise this line errors.
    
       ljmp 0x8, start32        # Working before or after a syntax directive, but is basically AT&T syntax
    #   jmp 0x8:start32          # fails here, works after a directive
       jmp 0x8, start32         # Michael Petch's suggested syntax that's still somewhat AT&Tish.  works with just cmdline opts. 
    
    .att_syntax
       ljmp $0x8, $start32      # working everywhere, even with clang
    .intel_syntax noprefix
       jmp 0x8:start32          # objdump disassembly syntax, but only works after a .intel_syntax noprefix directive
    
    .code32
    start32:
       nop
    

    I verified that -msyntax=intel -mnaked-reg work for other instructions where their effect is necessary: movzx ax, al works. But without -mnaked-reg we'd get "too many memory references" because "ax" and "al" would be taken as symbol names. Without or "operand size mismatch" without -msyntax=intel.

    A GAS listing from as -32 -msyntax=intel -mmnemonic=intel -mnaked-reg -o foo.o foo.s -al --listing-lhs-width=2 --listing-rhs-width=140
    (I'm pretty sure -mmnemonic=intel is irrelevant, and implied by syntax=intel.)

    Note that you can see which instructions worked because they have machine code, and which didn't (the first jmp 0x8:start32) because the left-hand column is empty for it. The very first column would normally be addresses, but is ???? because assembly failed. (Because I uncommented the jmp 0x8:start32 to show it failing the first time, working the 2nd time.)

    foo.s: Assembler messages:
    foo.s:6: Error: junk `:start32' after expression
    GAS LISTING foo.s                       page 1
    
    
       1                            # .intel_syntax noprefix        # rely on command line options to set this
       2                            .code16
       3 ???? 0FB6C0                   movzx   ax, al              # verify that command-line setting of intel_syntax worked, otherwise this line errors.
       4                       
       5 ???? EA170008 00              ljmp 0x8, start32        # Working before or after a syntax directive, but is basically AT&T syntax
       6                               jmp 0x8:start32          # fails here, works after a directive
       7 ???? EA170008 00              jmp 0x8, start32         # Michael Petch's suggested syntax that's still somewhat AT&Tish.  works with just cmdline opts. 
       8                       
       9                            .att_syntax
      10 ???? EA170008 00              ljmp $0x8, $start32      # working everywhere, even with clang
      11                            .intel_syntax noprefix
      12 ???? EA170008 00              jmp 0x8:start32          # objdump disassembly syntax, but only works after a .intel_syntax noprefix directive
      13                       
      14                            .code32
      15                            start32:
      16 ???? 90                       nop
      17                       
    

    (GAS does listing field widths for the left column in "words", which apparently means 32-bit chunks. That's why the 00 most-significant byte of the segment selector is separated by a space.)

    Putting a label before the jmp 0x8:label didn't help; it's not an issue of forward vs. backward reference. Even jmp 0x8:23 fails to assemble.


    Syntax "recommended" by disassemblers, from a working build:

    objdump -drwC -Mintel -mi8086 foo.o :

    foo.o:     file format elf32-i386
    
    Disassembly of section .text:
    
    00000000 <start32-0x17>:
       0:   0f b6 c0                movzx  ax,al
       3:   ea 17 00 08 00          jmp    0x8:0x17 4: R_386_16     .text
       8:   ea 17 00 08 00          jmp    0x8:0x17 9: R_386_16     .text
       d:   ea 17 00 08 00          jmp    0x8:0x17 e: R_386_16     .text
      12:   ea 17 00 08 00          jmp    0x8:0x17 13: R_386_16    .text
    
    00000017 <start32>:
      17:   90                      nop
    

    llvm-objdump --mattr=+16bit-mode --x86-asm-syntax=intel -d foo.o :

    00000000 <.text>:
           0: 0f b6 c0                      movzx   ax, al
           3: ea 17 00 08 00                ljmp    8, 23
           8: ea 17 00 08 00                ljmp    8, 23
           d: ea 17 00 08 00                ljmp    8, 23
          12: ea 17 00 08 00                ljmp    8, 23
    
    00000017 <start32>:
          17: 90                            nop
    

    And BTW, I didn't get clang 11.0 to assemble any Intel-syntax versions of this with a symbol name. ljmp 8, 12 assembles with clang, but not even ljmp 8, start32. Only by switching to AT&T syntax and back could I get clang's built-in assembler (clang -m32 -masm=intel -c) to emit a 16-bit mode far jmp.

    .att_syntax
       ljmp $0x8, $start32      # working everywhere, even with clang
    .intel_syntax noprefix
    

    Keep in mind this direct form of far JMP is not available in 64-bit mode; perhaps that's why LLVM's built-in assembler appears to have spent less effort on it.


    Footnote 1: Actually .intel_syntax prefix works, too, but never use that. Nobody want to see the franken-monster that is mov %eax, [%eax], or especially add %edx, %eax that's using dst, src order, but with AT&T decorated register names.