Here is some output from my compiler in assembler. It's the MPLAB C30 C compiler, based on GCC v3.23, for a dsPIC33FJ128GP802, a 16-bit moderately high speed DSP/MCU.
212: inline uint16_t ror_16(uint16_t word, int num)
213: {
078C4 608270 and.w w1,#16,w4
078C6 DE0204 lsr w0,w4,w4
078C8 780101 mov.w w1,w2
078CA EA8102 com.w w2,w2
078CC EA8183 com.w w3,w3
078CE 610170 and.w w2,#16,w2
078D0 DD0002 sl w0,w2,w0
078D2 700004 ior.w w0,w4,w0
214: num &= 16; // limit to 16 shifts
215: return (word >> num) | (word << (16 - num));
216: }
078D4 060000 return
In particular I'm interested in the following:
and.w w1,#16,w4 AND W1 with 16, storing result in W4
lsr w0,w4,w4 Logical shift right W0 by W4 times storing result in W4
mov.w w1,w2 Move W1 to W2
com.w w2,w2 Logical complement of W2 stored in W2
com.w w3,w3 Logical complement of W3 stored in W3 <-- This line is confusing me
and.w w2,#16,w2 AND W2 with 16, storing result in W2
sl w0,w2,w0 (Logical) shift left W0 left by W2 times storing result in W0
ior.w w0,w4,w0 Inclusive OR of W0 and W4 stored in W0
return Return from function
W0..W15 are an array of sixteen on chip 16-bit registers.
Effectively this simplifies to (in a primitive RTL):
W4 := W1 & 16
W4 := W0 LSR W4
W1 := W2
W2 := COM W2
W3 := COM W3
W2 := W2 & 16
W0 := W0 SL W2
W0 := W0 | W4
return
Now I'm confused at why it is computing the complement of W3 when there are only two passed arguments (W0 and W1 - it uses the W array for passing arguments to functions for functions with smaller arguments.) W3 is never used in the calculation, and is never returned. In fact it doesn't even appear to have data in it: nothing is stored in it by the function, and only the callee will have some data in it (although functions are not required to preserve W0..W7 so the callee should not be relying on it.) Why is it included in the code? Is it just a compiler glitch or error, or am I missing something?
And it's not just this code - I'm seeing the very same oddness in other parts of code. Even code designed to calculate things like complements of a 16-bit variable always seem to use two registers. It has me lost!
The function is not coded to limit the count to 16 (which I suspect you mean 0 to 16) but limits it to 0 or 16.
Instead of
num &= 16
you perhaps want
num > 16 ? (num & 15) : num
Re: the question, since the function is inlined, it can only be answered by looking at where it is used. Perhaps W3 is used for something in the surrounding code. Or it could be a "bug," but one that only has performance, not correctness, impact.
If num can be only 0 or 16 (as in your code) then (16 - num) can also only be 16 or 0, which is why C30 can do the "subtract" with a complement and mask.
FYI, when I don't inline, in C30 I get:
34: uint16_t ror_16(uint16_t word, int num)
35: {
05AF4 608170 and.w 0x0002,#16,0x0004
05AF6 DE0102 lsr 0x0000,0x0004,0x0004
05AF8 EA8081 com.w 0x0002,0x0002
05AFA 6080F0 and.w 0x0002,#16,0x0002
05AFC DD0001 sl 0x0000,0x0002,0x0000
05AFE 700002 ior.w 0x0000,0x0004,0x0000
36: num &= 16; // limit to 16 shifts
37: return (word >> num) | (word << (16 - num));
38: }
05B00 060000 return
I might code this as
34: uint16_t ror_16(uint16_t word, int num)
35: {
05AF4 780100 mov.w 0x0000,0x0004
36: num &= 15; // mod 16
05AF6 60806F and.w 0x0002,#15,0x0000
37: return (num == 0) ? word : ((word >> num) | (word << (16 - num)));
05AF8 320004 bra z, 0x005b02
05AFA DE1080 lsr 0x0004,0x0000,0x0002
05AFC 100070 subr.w 0x0000,#16,0x0000
05AFE DD1000 sl 0x0004,0x0000,0x0000
05B00 708100 ior.w 0x0002,0x0000,0x0004
38: }
05B02 780002 mov.w 0x0004,0x0000
05B04 060000 return