assemblyarmneonbare-metalcortex-a

Enable neon on ARM cortex-a series


I want to initialize on a bare metal cortex A-15 the NEON cp. After following ARM's directives I wrote this sequence at the end of my platform init sequence:

MOV r0, #0x00F00000
MRC p15, 0, r0, c1, c1, 2
ORR r0, r0, #0x0C00 
BIC r0, r0, #0xC000 
MCR p15, 0, r0, c1, c1, 2
ISB
MRC p15, 4, r0, c1, c1, 2
BIC r0, r0,  #0x0C00
BIC r0, r0, #(3<<14)
MCR p15, 4, r0, c1, c1, 2
ISB
MOV r3, #0x40000000
VMSR FPEXC, r3

I get this error:

Error: operand 0 must be FPSCR -- `vmsr FPEXC,r3'

I am using arm-eabi-as --version:

GNU assembler (GNU Binutils) 2.21
Copyright 2010 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `arm-eabi'.

If I change FPEXC to FPSCR the program compiles and the running raise unhandler exception:

MRC p15, 4, r0, c1, c1, 2

Solution

  • A sequence for initializing the VFPU can be found in u-boot source.

    .macro init_vfpu
      ldr r0, =(0xF << 20)
      mcr p15, 0, r0, c1, c0, 2
      mov r3, #0x40000000
      .long 0xeee83a10
      /* vmsr FPEXC, r3 */
    .endm /* init_vfpu */
    

    As documented in the binutils mailing list, the vmsr FPEXC bug has been fixed in the binutils 2.23 branch as well as the HEAD and the 2.24 development branch which will be released shortly. Fixes exist in the 2.23.1 and 2.23.2 releases of binutils.

    Here is a sample session,

    $ cat t.S
    init_vpu:
      ldr r0, =(0xF << 20)
      mcr p15, 0, r0, c1, c0, 2
      mov r3, #0x40000000
      vmsr FPEXC, r3
      bx  lr
      .ltorg
    $ arm-none-linux-gnueabi-as -march=armv7-a -mcpu=cortex-a15 -mfpu=neon t.S -o t.o
    $ arm-none-linux-gnueabi-as --version | grep assembler
    GNU assembler (crosstool-NG hg+default-86a8d1d467c8) 2.23.1
    This assembler was configured for a target of `arm-none-linux-gnueabi'.
    $ objdump --version | grep Binutils
    GNU objdump (GNU Binutils for Ubuntu) 2.23.2
    $ objdump -S t.o 
    
    t.o:     file format elf32-littlearm
    
    Disassembly of section .text:
    
    00000000 <init_vpu>:
       0:   e3a0060f        mov     r0, #15728640   ; 0xf00000
       4:   ee010f50        mcr     15, 0, r0, cr1, cr0, {2}
       8:   e3a03101        mov     r3, #1073741824 ; 0x40000000
       c:   eee83a10        vmsr    fpexc, r3
      10:   e12fff1e        bx      lr
    

    The above sequence should work for all of the Cortex-A series. The sequence is for a system without virtualization or TrustZone active.