I'm doing 32-bit multiplication for a Cortex-M3 controller using the "umull" ARM instruction. I'm getting the result in two 32-bit registers, RdLo and RdHi. How can I get the complete 64-bit result?
I wrote a function which takes two 32-bit values, and I multiplied them using the "umull" instruction, which gives the result in RdLo and RdHi, two 32-bit registers.
I want to return a 64-bit result from that function.
long mulhi1, mullo1;
unsigned long Multiply(long i, long j)
{
asm ("umull %0, %1, %[input_i], %[input_j];"
: "=r" (mullo1), "=r" (mulhi1)
: [input_i] "r" (i), [input_j] "r" (j)
: /* No clobbers */
);
}
I'm expecting a 64-bit result as the return value from that function. But "umull" gives result in mullo1,mulhi1, separate 32-bit registers.
What changes do I have to make to get a 64-bit result?
Method 1:
Use a union of 32 and 64 bit integers to receive the result
#include <stdint.h>
union dw {
uint64_t dword;
struct {
#ifdef __ARMEB__
uint32_t high_word;
uint32_t low_word;
#else
uint32_t low_word;
uint32_t high_word;
#endif
};
};
uint64_t umull(uint32_t op1, uint32_t op2) {
union dw result;
asm volatile(
"umull %[result_low], %[result_high], %[operand_1], %[operand_2]"
:[result_low] "=r" (result.low_word), [result_high] "=r" (result.high_word)
:[operand_1] "r" (op1), [operand_2] "r" (op2)
);
return result.dword;
}
Also note that unsigned long
is a 32-bit type on 32-bit ARM. You need unsigned long long
, or use uint64_t
from <stdint.h>
.
Method 2:
Use Q
and R
modifiers on a 64-bit integer in the assembly template to specify its halves.
(This is based on an answer on how can I get the ARM MULL instruction to produce its output in a uint64_t in gcc?, which might be a duplicate by the way. Answers there also show using shift and OR to combine 32-bit halves into a uint64_t
, which will optimize away on a 32-bit machine.)
uint64_t umull(uint32_t op1, uint32_t op2) {
uint64_t result;
asm volatile(
"umull %Q[dwresult], %R[dwresult], %[operand_1], %[operand_2]"
:[dwresult] "=r" (result)
:[operand_1] "r" (op1), [operand_2] "r" (op2)
);
return result;
}
This is not exactly well documented; one has to browse the GCC sources to find it.