I have an issue with the C code below into which I have included Sparc Assembly. The code is compiled and running on Debian 9.0 Sparc64. It does a simple summation and print the result of this sum which equals to nLoop
.
The problem is that for an initial number of iterations greater than 1e+9, the final sum at the end is systematically equal to 1410065408 : I don't understand why since I put explicitly unsigned long long int
type for sum
variable and so sum
can be in [0, +18,446,744,073,709,551,615]
range.
For example, for nLoop = 1e+9
, I expect sum
to be equal to 1e+9
.
Does issue come rather from included Assembly Sparc code which could not handle 64 bits variables (in input or output) ?
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
int i;
// Init sum
unsigned long long int sum = 0ULL;
// Number of iterations
unsigned long long int nLoop = 10000000000ULL;
// Loop with Sparc assembly into C source
asm volatile ("clr %%g1\n\t"
"clr %%g2\n\t"
"mov %1, %%g1\n" // %1 = input parameter
"loop:\n\t"
"add %%g2, 1, %%g2\n\t"
"subcc %%g1, 1, %%g1\n\t"
"bne loop\n\t"
"nop\n\t"
"mov %%g2, %0\n" // %0 = output parameter
: "=r" (sum) // output
: "r" (nLoop) // input
: "g1", "g2"); // clobbers
// Print results
printf("Sum = %llu\n", sum);
return 0;
}
How to fix this problem of range and allow to use 64 bits variables into Sparc Assembly code ?
PS: I tried to compile with gcc -m64, issue remains.
As requested by @zwol, below is the output Assembly Sparc code generated with : gcc -O2 -m64 -S loop.c -o loop.s
.file "loop.c"
.section ".text"
.section .rodata.str1.8,"aMS",@progbits,1
.align 8
.LC0:
.asciz "Sum = %llu\n"
.section .text.startup,"ax",@progbits
.align 4
.global main
.type main, #function
.proc 04
main:
.register %g2, #scratch
save %sp, -176, %sp
sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %l7
call __sparc_get_pc_thunk.l7
add %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7
sethi %hi(9764864), %o1
or %o1, 761, %o1
sllx %o1, 10, %o1
#APP
! 13 "loop.c" 1
clr %g1
clr %g2
mov %o1, %g1
loop:
add %g2, 1, %g2
subcc %g1, 1, %g1
bne loop
nop
mov %g2, %o1
! 0 "" 2
#NO_APP
mov 0, %i0
sethi %gdop_hix22(.LC0), %o0
xor %o0, %gdop_lox10(.LC0), %o0
call printf, 0
ldx [%l7 + %o0], %o0, %gdop(.LC0)
return %i7+8
nop
.size main, .-main
.ident "GCC: (Debian 7.3.0-15) 7.3.0"
.section .text.__sparc_get_pc_thunk.l7,"axG",@progbits,__sparc_get_pc_thunk.l7,comdat
.align 4
.weak __sparc_get_pc_thunk.l7
.hidden __sparc_get_pc_thunk.l7
.type __sparc_get_pc_thunk.l7, #function
.proc 020
__sparc_get_pc_thunk.l7:
jmp %o7+8
add %o7, %l7, %l7
.section .note.GNU-stack,"",@progbits
UPDATE 2:
As suggested by @Martin Rosenau, I did following modifications :
loop:
add %g2, 1, %g2
subcc %g1, 1, %g1
bpne %icc, loop
bpne %xcc, loop
nop
mov %g2, %o1
But at the compilation, I get :
Error: Unknown opcode: `bpne'
What could be the reason for this compilation error ?
subcc %%g1, 1, %%g1 bne loop
Your problem is the bne
instruction:
Unlike the x86-64 CPU Sparc64 CPUs don't have different instructions for 32- and 64-bit subtraction:
If you want subtract 1 from 0x12345678 the result is 0x12345677. If you subtract 1 from 0xF00D12345678 the result is 0xF00D12345677 so if you only use the lower 32 bits of a register a 64-bit subtraction has the same effect as the 32-bit subtraction.
Therefore the Sparc64 CPUs do not have different instructions for 64-bit and 32-bit addition, subtraction, multiplication, left shift etc.
These CPUs have different instructions for 32-bit and 64-bit operations when the upper 32 bits influence the lower 32 bits (e.g. right shift).
However the zero flag depends on the result of the subcc
operation.
To solve this problem the Sparc64 CPUs have each of the integer flags (zero, overflow, carry, sign) twice:
The 32-bit zero flag will be set if the lower 32 bits of a register are zero; the 64-bit zero flag will be set if all 64 bits of a register are zero.
To be compatible with existing 32-bit programs the bne
instruction will check the 32-bit zero flag, not the 64-bit zero flag.
is systematically equal to 1410065408
1e10 = 0x200000000 + 1410065408 so after 1410065408 steps the value 0x200000000 is reached which has the lower 32 bits set to 0 and bne
will not jump any more.
However for 1e11 you should not get 1410065408 but 1215752192 as a result because 1e11 = 0x1700000000 + 1215752192.
bne
There is a new instruction named bpne
which has up to 4 arguments!
In the simplest variant (with only two arguments) the instruction should (I have not used Sparc for 5 years now, so I'm not sure) work like this:
bpne %icc, loop # Like bne (based on the 32-bit result)
bpne %xcc, loop # Like bne, but based on the 64-bit result
EDIT
Error: Unknown opcode: 'bpne'
I just tried using GNU assembler:
GNU assembler names the new instruction bne
- just like the old one:
bne loop # Old variant
bne %icc, loop # New variant based on the 32-bit result
bne %xcc, loop # (New variant) Based on the 64-bit result
subcc %g1, 1, %g1 bpne %icc, loop bpne %xcc, loop nop
The first bpne
(or bne
) makes no sense: Whenever the first line would do the jump the second line would also jump. And if you don't use .reorder
(however this is the default) you would also need to add a nop
between the two branch instructions...
The code should look like this (assuming your assembler also names bpne
bne
):
subcc %g1, 1, %g1
bne %xcc, loop
nop