cgccassemblymips64

understanding asm blocks written for gcc


what does the following assembly mean in simple C (this is meant to be compiled with gcc):

asm volatile
    (
    "mov.d %0,%4\n\t"
    "L1: bge %2,%3,L2\n\t"
    "gsLQC1 $f2,$f0,0(%1)\n\t"
    "gsLQC1 $f6,$f4,0(%5)\n\t"
    "madd.d %0,%0,$f6,$f2\n\t"
    "madd.d %0,%0,$f4,$f0\n\t"
    "add %1,%1,16\n\t"
    "add %2,%2,2\n\t"
    "add %5,%5,16\n\t"
    "j L1\n\t"
    "L2: nop\n\t" 
    :"=f"(sham)
    :"r"(foo),"r"(bar),"r"(ro),"f"(sham),"r"(bo)
    :"$f0","$f2","$f4","$f6"
    );

After several hours of searching and reading I've come up with the following assembly code in AT&T syntax:

mov.d %xmm0,%xmm1
L1: bge %ebx,%ecx,L2
gsLQC1 $f2,$f0,0(%eax)
gsLQC1 $f6,$f4,0(%esi)
madd.d %xmm0,%xmm0,$f6,$f2
madd.d %xmm0,%xmm0,$f4,$f0
add %eax,%eax,16
add %ebx,%ebx,2
add %esi,%esi,16
jmp L1
L2: nop

I'm in the process of finding a way to run this on Windows and will update when I do figure out a way to do so (after fixing all of the mistakes that I'm sure I've made).

I have very little experience with x86 assembly, that said, I vaguely recognize that this is a loop, but I haven't been able to find what the instruction gsLQC1 means. or what the purpose of the loop would be.

If you have any questions for me, I'll be happy to answer them. If you have any insights, I would love to hear them. Thank you for your time.

EDIT:

The function itself is dealing with performing a Singular Value Decomposition (SVD) which mainly has to do with matrices.

I'm updating the below with some comments of my own, the original writer of the assembly did not write these but I am 80% confident that they are correct, given my research of asm block notation for GCC.

    asm volatile
       (
       "mov.d %0,%4\n\t"
       "L1: bge %2,%3,L2\n\t"
       "gsLQC1 $f2,$f0,0(%1)\n\t"
       "gsLQC1 $f6,$f4,0(%5)\n\t"
       "madd.d %0,%0,$f6,$f2\n\t"
       "madd.d %0,%0,$f4,$f0\n\t"
       "add %1,%1,16\n\t"
       "add %2,%2,2\n\t"
       "add %5,%5,16\n\t"
       "j L1\n\t"
       "L2: nop\n\t" 
       :"=f"(sham) /*Corresponds to %0 in the above code*/
       :"r"(foo) /*Corresponds to %1*/,"r"(bar) /*%2*/,"r"(ro) /*%3*/,"f"(sham) /*%4*/,"r"(bo) /*%5*/
       :"$f0","$f2","$f4","$f6"
       );

I assumed that this was in x86, but was most likely wrong. I believe the above is MIPS64 assembly written for a processor in the loongson family.

Thank you for the interest in the question. I appreciate your time. Again, if there are any other questions, I would be happy to try my best to answer them.

P.S. the original code can be found here, and the assembly that I am asking about starts on line 189


Solution

  • This isn't really an answer, but it doesn't fit in a comment either. Given that you omit several critical pieces of information (what processor the source instructions are for, data types of the parameters, a general sense of what the code is doing, etc), it's hard to come up with a good answer.

    In a general sense, I'd be thinking:

    float messy(const float *foo, int bar, int ro, const float *bo)
    {
        float sham = 0;
    
        while (bar < ro)
        {
           __m256 a = _mm256_load_ps(foo);
           __m256 b = _mm256_load_ps(bar);
    
           __m256 c = _mm256_add_ps(a, a);
           __m256 d = _mm256_add_ps(b, b);
    
           foo += 2;
           bar += 2;
           bo += 2;
        }
    
        return sham;
    }
    

    That's not going to be quite right, since (among other things) sham isn't getting set. But it's a place to start. Without details of what madd.d does (which is hard to say without knowing what hardware we're talking about), that's as close as I can get you.

    Just to emphasize what I said in my comment, the original code does not appear to be well written (modifying read-only parameters, double jumps, NO COMMENTS, etc).