gccembeddedinline-assemblymicrochipxc16

GCC extended asm differentiating literal vs register


I find myself extensively using inline asm, and very often want to be able to use a register or a literal in a given bit of asm, however I cannot see how to make this an option with the Microchip XC16 suite.

As far as I can tell, you need to manually encode the literal symbol #, and this is an incompatible with a prefix. What this means is that the following code does not compile:

asm("MOV %1, %0" : "=r" (res) : "i" (1));
Invalid operands specified ('mov 1,w0').

But the following does:

asm("MOV #%1, %0" : "=r" (res) : "i" (1));

Which is of course incompatible with registers:

asm("MOV #%1, %0" : "=r" (res) : "ri" (x));
Invalid operands specified ('mov #w0,w0').

So it seems to me that Microchip are not following GCC conventions, where I believe the literal symbol should have been embedded in the operand and this is making it particularly hard to work with.

I was wondering on the off-chance.. does anyone have any bright ideas on how to work around this problem?

For now I'm passing __builtin_constant_p as an additional parameter that I then .if on in the asm as follows, but to say it gets unwieldy fast would be an understatement.

asm(".if %[isk]  \n"
    "MOV #%1, %0 \n"
    ".elseif     \n"
    "MOV %1, %0  \n"
    ".endif      \n"
    : "=r" (res)
    : "ri" (x), [isk] "i" (__builtin_constant_p(x));

And I don't even believe that GCC guarantees that %1 will be a literal if isk is true, which means having to if-then-else block it all out C side... sigh.


To clarify MOV is just a sample instruction. These processors (dsPIC33Es) have zero overhead single and multi-instruction loops which require asm to exploit, syntax of which looks like the following:

/* this code demonstrates compilation failure if cnt is a constant
 * as there is no # prefix before the %[cnt] */
asm("REPEAT %[cnt]        \n"
    "  MOV [%0++], [%1++] \n"
    : "+r" (src), "+r" (dst), "=m" (*dst)
    : "m" (*src), [cnt] "ri" (cnt));

This memcpy loop takes cnt+1 cycles to execute, which due to pipelining is actually twice as fast as if you completely unrolled the loop and 6 times faster than branching each iteration. Along with their multi-instruction DO loop variant they're pretty essential to get the most out of these processors.


Solution

  • I have found a way to detect in asm if a given parameter is a literal or not, it's far from ideal but it seems to work.

    First, in an asm header file, mark a symbol for each register:

    .irp r,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
      .set _IS_REG_w&r&, 1 ; mark all registers as "REGs"
      .set _IS_REG_W&r&, 1
    .endr
    

    And then to use:

    .ifdecl _IS_REG_%0
      REPEAT %0
    .else
      REPEAT #%0
    .endif
    

    Which can be wrapped up in an asm macro:

    .macro REPEATN cnt
        .ifdecl _IS_REG_&cnt&
            REPEAT \cnt
        .else
            REPEAT #\cnt
        .endif
    .endm
    

    For easy embedding in inline asm:

    void DelayCycles(int count)
    {
        asm("REPEATN %[cnt]    \n"
            "    NOP           \n"
            :
            : [cnt] "ri" (count));
    }