I am looking for an efficient algorithm that computes arithmetic shift-right of integers that rounds to nearest integer with halfway rounding toward zero behavior. An answer can be a proper description of such an algorithm or its implementation (e.g. in assembly or C).
You can assume:
"Efficient algorithm" asks loosely for a reasonable tradeoff between runtime and space complexity. It boils down to minimum instructions needed to solve the task while not requiring an excessive amount of space/data to do so. As a reference the Cortex-M0 instruction set is proposed. In particular it places the following limits on a solution: no floating point instructions, no integer division instruction, no integer modulo instruction.
This is a prototype in C language:
int asrz(int x, int n);
The algorithm should compute
Y = X >> N
on integer values using the following rounding behavior:
The true result is rounded to nearest integer. As a tie-breaking rule, if the fractional part is 0.5, rounding is done toward zero. Here are some examples of this:
0 >> 1 = 0
1 >> 1 = 0 (0.5)
2 >> 1 = 1
3 >> 1 = 1 (1.5)
...
2 >> 2 = 0 (0.5)
3 >> 2 = 1 (0.75)
4 >> 2 = 1
5 >> 2 = 1 (1.25)
6 >> 2 = 1 (1.5)
7 >> 2 = 2 (1.75)
...
12 >> 3 = 1 (1.5)
13 >> 3 = 2 (1.625)
The results are symmetrical around zero for negative input values:
-1 >> 1 = 0 (-0.5)
-2 >> 1 = -1
-3 >> 1 = -1 (-1.5)
...
-2 >> 2 = 0 (-0.5)
-3 >> 2 = -1 (-0.75)
-4 >> 2 = -1
-5 >> 2 = -1 (-1.25)
-6 >> 2 = -1 (-1.5)
-7 >> 2 = -2 (-1.75)
...
-12 >> 3 = -1 (-1.5)
-13 >> 3 = -2 (-1.625)
The behavior differs from ASR instruction available in most instruction sets which:
Let's do “half up” rounding first, as it's the easiest case.
As far as I know, the simplest way on architectures with a carry flag is to add carry out from the right shift to the result:
sar $10, %eax ; shift eax to the right 10 places arithmetically
adc $0, %eax ; add the carry to eax
On Cortex-M0, this can be done similarly, e.g. to shift r1
10 places to the right with rounding:
movs r0, #0 @ clear r0
asrs r1, #10 @ shift r1 10 places to the right arithmetically
adcs r1, r0 @ add carry to r1
Note that due to lack of an adc
instruction with immediate operand, we need to hold zero in some register. This can be any register and if you do the rounding in a loop, you can clear a register ahead of time and reuse it throughout the code.
Likewise, if you want to add the result of the rounded shift to some other term, you can just add that other term in the adcs
step instead of adding zero, saving a register and an instruction.
The way this works is that an arithmetic right shift normally rounds towards negative infinity. If following the right shift, the carry flag is set, the result of the unrounded division by 2k has a fractional part of 0.5 or more (or less than 0.5 in case of a negative number) By adding that fractional part back in, we turn a rounding towards negative infinity into a rounding to nearest.
This ports nicely to ARM, though on ARM64 and architectures without carry flags such as RISC-V, it's more annoying to do. You have to manually compute the carry and then add it.
To turn this into half down rounding (i.e. round down if the fractional part is 0.5), the same approach can be used if 1 is subtracted from the number to be rounded before the rounding. As the 1 is shifted to the right, it subtracts an ε ≤ 0.5 from the number to be rounded, tilting the scale such that the 0.5 case is rounded down instead of up, without affecting any other number.
One limitation of this approach is that the shift amount must be strictly positive; half up rounding also works for a (variable) shift amount of zero, but this one will not.
To round half away from zero, we want half-down rounding if the number is negative and half-up rounding if it is positive. This is achieved by subtracting 1 before rounding if the number is negative. We can achieve that by first computing the sign of the number as 0 or −1, then adding the sign, and then rounding. On x86 for example:
cdq ; compute the sign of eax in edx
add %edx, %eax ; add -1/0 to eax
sar $10, %eax ; shift to the right arithmetically
adc $0, %eax ; add carry to eax
or without being constrainted to eax
:
bt $31, %eax ; set carry flag to sign of eax
sbb $0, %eax ; subtract 1 from eax if carry set
sar $10, %eax ; shift to the right arithmetically
adc $0, %eax ; add carry to eax
On Cortex-M0 it's also very easy:
asrs r0, r1, #31 @ compute the sign of r1 in r0
adds r1, r0 @ add -1/0 to r1
movs r0, #0 @ clear r0
asrs r1, #10 @ shift r1 to the right arithmetically
adcs r1, r0 @ add carry to r1
To round half towards zero, we use the same approach as for half away from zero rounding, except we add the complemented sign, effecting half down rounding for the positive case and half up rounding for the negative case.
On x86, this is done the same way as before
cdq ; compute the sign of eax in edx
not %edx ; complement the sign
add %edx, %eax ; add -1/0 to eax
sar $10, %eax ; shift to the right arithmetically
adc $0, %eax ; add carry to eax
or with one less instruction but not necessarily faster:
bt $31, %eax ; set carry flag to the sign of eax
adc $-1, %eax ; add -1 to eax if carry clear
sar $10, %eax ; shift to the right arithmetically
adc $0, %eax ; add carry to eax
whereas on Cortex-M0 we can use a trick to avoid an extra instruction:
movs r0, #0 @ clear r0
cmn r1, r1 @ set carry if r1 is negative
sbcs r1, r0 @ decrement r1 if r1 is not negative
asrs r1, #10 @ shift r1 to the right arithmetically
adcs r1, r0 @ add carry to r1