cmipsgnusimdintrinsics

Failed to use GNU MIPS builtin functions of vector (SIMD)


I am working on an embedded Linux system (kernel-5.10.24) for a MIPS CPU. Now I want to test GNU builtin functions for MIPS SIMD listed in https://gcc.gnu.org/onlinedocs/gcc/MIPS-SIMD-Architecture-Built-in-Functions.html

I tested v4f32 __builtin_msa_fsqrt_w (v4f32); as follows.

#define _GNU_SOURCE
#include <msa.h>
#include <stdint.h>
#include <stdio.h>
#include <math.h>
#include <time.h>

#define ALIGN16 __attribute__((aligned(16)))
ALIGN16 uint32_t a[] = {64, 128, 256, 512};
ALIGN16 uint32_t r[] = {64, 128, 256, 512};
ALIGN16 uint32_t sr[] = {64, 128, 256, 512};

static int verification_test(void)
{
    int i = 0;
    v4i32 va, fr;

    for (i = 0; i < sizeof(a)/sizeof(a[0]); i++) {
        sr[i] = sqrt(a[i]);
    }
    // Get SQRT with MSA builtin functions
    va = __builtin_msa_ld_w(a, 0);
    fr = (v4i32)__builtin_msa_fsqrt_w((v4f32)va);
    __builtin_msa_st_w(fr, r, 0); // Save result in fr to array of r.

    for (i = 0; i < sizeof(r)/sizeof(r[0]); i++) {
        printf("%d: %f\n", i, (double)sr[i]);
        printf("%d: %f\n", i, (double)r[i]);
    }
    return 0;
}
int main()
{
    verification_test();
    return 0;
}

It cacluated sqrt of 4 numbers, but I found the results of builtin function are different from the results got from sqrt(), as follows.

0: 8.000000
0: 464848115.000000
1: 11.000000
1: 469762048.000000
2: 16.000000
2: 473236723.000000
3: 22.000000
3: 478150656.000000

So what is wrong with my codes, how to fix it?

Updated codes according to the answer

I changed the line to

fr = __builtin_msa_ftint_s_w(__builtin_msa_fsqrt_w(__builtin_msa_ffint_s_w(va)));

And I got the results as follows,

0: 8.000000
0: 8.000000
1: 11.000000
1: 11.000000
2: 16.000000
2: 16.000000
3: 22.000000
3: 23.000000 <<??

The last one is from MSA instruction, it is a little from the result of sqrt(), does it make sense?


Solution

  • Educated guess here as I don't really know MIPS, but casting vector types between integer and float vector types needs to be done with the appropriate intrinsic. Try changing

    (v4i32)__builtin_msa_fsqrt_w((v4f32)va) 
    

    to

    builtin_msa_ftint_u_w(__builtin_msa_fsqrt_w(__builtin_msa_ffint_u_w(va)))
    

    You probably should check the documentation, if you can find some, on how it rounds to make sure it does what you expect.