branchcortex-mthumb

Using B instructions in Cortex-M3 (thumb)


I read that in Cortex-M3 which is thumb only, whenever we write to PC, we must make sure the the target address LSB is a '1' to ensure the processor stays in thumb mode.

Also, when we use 'BX reg', the reg values must have LSB = 1 to enable thumb mode.

How about the case when we are using a 'B label' in a cortex-m3? this 'label' will have a value with LSB = 0 since 16-bit/32-bit instructions are aligned to even address. Isn't 'B label' equivalent to 'PC := label'?

Are 'B label' and 'BL label' exceptional cases where the writing of PC will not affect the processor mode?

Thank you.


Solution

  • the target address needs to have the lsbit a 1 for the bx (and blx) instruction, the 1 is stripped when it goes into the pc. the b instruction is pc relative and the math shown in the arm docs makes it clear it is even.

    Generally you dont have to worry about this in any case at any time if you let the tools do their job.

    thumb.s

    .thumb
    
    .globl _start
    _start:
        b reset
        nop
        nop
    .thumb_func
    reset:
        nop
        nop
        nop
        nop
        ldr r0,=reset
        bx r0
    

    then

    arm-none-eabi-as thumb.s -o thumb.o
    arm-none-eabi-ld -Ttext=0x1000 thumb.o -o thumb.elf
    arm-none-eabi-objdump -D thumb.elf 
    

    which gives

    thumb.elf:     file format elf32-littlearm
    
    
    Disassembly of section .text:
    
    00001000 <_start>:
        1000:   e001        b.n 1006 <reset>
        1002:   46c0        nop         ; (mov r8, r8)
        1004:   46c0        nop         ; (mov r8, r8)
    
    00001006 <reset>:
        1006:   46c0        nop         ; (mov r8, r8)
        1008:   46c0        nop         ; (mov r8, r8)
        100a:   46c0        nop         ; (mov r8, r8)
        100c:   46c0        nop         ; (mov r8, r8)
        100e:   4801        ldr r0, [pc, #4]    ; (1014 <reset+0xe>)
        1010:   4700        bx  r0
        1012:   10070000    andne   r0, r7, r0
        ...
    

    the branch takes care of itself

        1000:   e001        b.n 1006 <reset>
    ...    
    00001006 <reset>:
    

    the encoding in a branch is in units of 16 bit quantities not units of bytes, then they multiply that by 2 (shift it) to get the byte address which is always even. the pc is never odd, it is the value you feed bx or blx that is odd.

    Now because I used .thumb_func before reset that told the assembler this is a thumb label not an arm label. So when I said please load the address of reset into r0 the assembler then allocated some data for the value 0x00001007 which shows up weird in the disassembly but it is there. and they have set the lsbit for us

    00001006 <reset>:
     ...
        100e:   4801        ldr r0, [pc, #4]    ; (1014 <reset+0xe>)
        1010:   4700        bx  r0
        1012:   10070000    andne   r0, r7, r0
    

    now if you were to remove the .thumb_func

    100c:   46c0        nop         ; (mov r8, r8)
    100e:   4801        ldr r0, [pc, #4]    ; (1014 <reset+0xe>)
    1010:   4700        bx  r0
    1012:   10060000    andne   r0, r6, r0
    

    the assembler thinks it is an arm address and does not set the lsbit and this code would crash. Now if you are concerned about it you can always add the extra orr r0,#1 but that is really just a hack. Learn for whichever assembler you are using how to declare the label as a thumb label not an arm. Yes it seems stupid that gnu assembler knows this code segment is thumb because we told it to yet it cant figure out that labels within thumb code are ... thumb labels. very stupid tool.

    And I would assume there are other more verbose gnu assembler directives that will also allow you to declare this a function or a thumb label or whatever. And of course every assembler is different so dont assume that gnu assembler directives work on other assembler directives.

    If you mix C and asm the C compiler is not stupid it knows that -mthumb makes all the functions and globals (labels) thumb and depending on how and where you use them in the code the linker places the correct value. It even can go so far as to correctly switch modes for you, bl main in thumb code where main is arm code and it places a trampoline in the code for you that switches modes. or vice versa, at least I have see the tool do this (and demonstrated it a number of times in stack overflow answers). I dont remember if it was tricky to get it to work or not you should always disassemble periodically and insure the linker is doing this for you otherwise make it do it or you can always fall back on doing it yourself.

    so

    Remember that only bx and blx need the lsbit set for thumb and two lsbits reset for branching to arm. The blx and bx instructions will remove that lsbit and leave an even numbered pc in the pc (very simple do a mov r0,pc and then look at it in thumb code).

    Ideally the unconditional and conditional branches (not bx) should never switch modes arm to arm and thumb to thumb. Same for bl, but I have seen the gnu tools help out with that, if you want your code pure then load the address in a register which the tools have to do right otherwise the whole toolchain is a fail, and blx instead of bl to that label and not rely on the toolchain doing a trampoline for you.