cgccassemblyinline-assemblysparc

Issue with unsigned long long int with sparc assembly code on sparc64


I have an issue with the C code below into which I have included Sparc Assembly. The code is compiled and running on Debian 9.0 Sparc64. It does a simple summation and print the result of this sum which equals to nLoop.

The problem is that for an initial number of iterations greater than 1e+9, the final sum at the end is systematically equal to 1410065408 : I don't understand why since I put explicitly unsigned long long int type for sum variable and so sum can be in [0, +18,446,744,073,709,551,615] range.

For example, for nLoop = 1e+9, I expect sum to be equal to 1e+9.

Does issue come rather from included Assembly Sparc code which could not handle 64 bits variables (in input or output) ?

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
  int i;
  // Init sum
  unsigned long long int sum = 0ULL;
  // Number of iterations
  unsigned long long int nLoop = 10000000000ULL;

   // Loop with Sparc assembly into C source
   asm volatile ("clr %%g1\n\t"
                 "clr %%g2\n\t"
                 "mov %1, %%g1\n" // %1 = input parameter
                 "loop:\n\t"
                 "add %%g2, 1, %%g2\n\t"
                 "subcc %%g1, 1, %%g1\n\t"
                 "bne loop\n\t"
                 "nop\n\t"
                 "mov %%g2, %0\n" // %0 = output parameter
                 : "=r" (sum)     // output
                 : "r" (nLoop)    // input
                 : "g1", "g2");   // clobbers

  // Print results
  printf("Sum = %llu\n", sum);

  return 0;

}

How to fix this problem of range and allow to use 64 bits variables into Sparc Assembly code ?

PS: I tried to compile with gcc -m64, issue remains.

Update 1

As requested by @zwol, below is the output Assembly Sparc code generated with : gcc -O2 -m64 -S loop.c -o loop.s

        .file   "loop.c"
        .section        ".text"
        .section        .rodata.str1.8,"aMS",@progbits,1
        .align 8
.LC0:
        .asciz  "Sum = %llu\n"
        .section        .text.startup,"ax",@progbits
        .align 4
        .global main
        .type   main, #function
        .proc   04
main:
        .register       %g2, #scratch
        save    %sp, -176, %sp
        sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %l7
        call    __sparc_get_pc_thunk.l7
         add    %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7
        sethi   %hi(9764864), %o1
        or      %o1, 761, %o1
        sllx    %o1, 10, %o1
#APP
! 13 "loop.c" 1
        clr %g1
        clr %g2
        mov %o1, %g1
loop:
        add %g2, 1, %g2
        subcc %g1, 1, %g1
        bne loop
        nop
        mov %g2, %o1

! 0 "" 2
#NO_APP
        mov     0, %i0
        sethi   %gdop_hix22(.LC0), %o0
        xor     %o0, %gdop_lox10(.LC0), %o0
        call    printf, 0
         ldx    [%l7 + %o0], %o0, %gdop(.LC0)
        return  %i7+8
         nop
        .size   main, .-main
        .ident  "GCC: (Debian 7.3.0-15) 7.3.0"
        .section        .text.__sparc_get_pc_thunk.l7,"axG",@progbits,__sparc_get_pc_thunk.l7,comdat
        .align 4
        .weak   __sparc_get_pc_thunk.l7
        .hidden __sparc_get_pc_thunk.l7
        .type   __sparc_get_pc_thunk.l7, #function
        .proc   020
__sparc_get_pc_thunk.l7:
        jmp     %o7+8
         add    %o7, %l7, %l7
        .section        .note.GNU-stack,"",@progbits

UPDATE 2:

As suggested by @Martin Rosenau, I did following modifications :

loop:
        add %g2, 1, %g2
        subcc %g1, 1, %g1
        bpne %icc, loop
        bpne %xcc, loop
        nop
        mov %g2, %o1

But at the compilation, I get :

Error: Unknown opcode: `bpne'

What could be the reason for this compilation error ?


Solution

  • subcc %%g1, 1, %%g1
    bne loop
    

    Your problem is the bne instruction:

    Unlike the x86-64 CPU Sparc64 CPUs don't have different instructions for 32- and 64-bit subtraction:

    If you want subtract 1 from 0x12345678 the result is 0x12345677. If you subtract 1 from 0xF00D12345678 the result is 0xF00D12345677 so if you only use the lower 32 bits of a register a 64-bit subtraction has the same effect as the 32-bit subtraction.

    Therefore the Sparc64 CPUs do not have different instructions for 64-bit and 32-bit addition, subtraction, multiplication, left shift etc.

    These CPUs have different instructions for 32-bit and 64-bit operations when the upper 32 bits influence the lower 32 bits (e.g. right shift).

    However the zero flag depends on the result of the subcc operation.

    To solve this problem the Sparc64 CPUs have each of the integer flags (zero, overflow, carry, sign) twice:

    The 32-bit zero flag will be set if the lower 32 bits of a register are zero; the 64-bit zero flag will be set if all 64 bits of a register are zero.

    To be compatible with existing 32-bit programs the bne instruction will check the 32-bit zero flag, not the 64-bit zero flag.

    is systematically equal to 1410065408

    1e10 = 0x200000000 + 1410065408 so after 1410065408 steps the value 0x200000000 is reached which has the lower 32 bits set to 0 and bne will not jump any more.

    However for 1e11 you should not get 1410065408 but 1215752192 as a result because 1e11 = 0x1700000000 + 1215752192.

    bne

    There is a new instruction named bpne which has up to 4 arguments!

    In the simplest variant (with only two arguments) the instruction should (I have not used Sparc for 5 years now, so I'm not sure) work like this:

    bpne %icc, loop   # Like bne (based on the 32-bit result)
    bpne %xcc, loop   # Like bne, but based on the 64-bit result
    

    EDIT

    Error: Unknown opcode: 'bpne'
    

    I just tried using GNU assembler:

    GNU assembler names the new instruction bne - just like the old one:

    bne loop         # Old variant
    bne %icc, loop   # New variant based on the 32-bit result
    bne %xcc, loop   # (New variant) Based on the 64-bit result
    
      subcc %g1, 1, %g1
      bpne %icc, loop
      bpne %xcc, loop
      nop
    

    The first bpne (or bne) makes no sense: Whenever the first line would do the jump the second line would also jump. And if you don't use .reorder (however this is the default) you would also need to add a nop between the two branch instructions...

    The code should look like this (assuming your assembler also names bpne bne):

       subcc %g1, 1, %g1
       bne %xcc, loop
       nop