assemblyarm64complex-numbersneonfma

If ARM has FMLA-FMLS, then why ARM has only FCMLA?


Arm® Architecture Reference Manual for A-profile architecture has the following instructions:

One may wonder: why there is no FCMLS (Floating-point complex multiply subtract)?

In other words: any knowledge why ARM decided not to implement FCMLS?


Note that FCMLA stands for "Floating-point complex multiply accumulate", not "Floating-point complex multiply-add", nor "Floating-point fused complex multiply-add".


Solution

  • Note first that FMLA and FCMLA are not fully analogous, because despite its name, FCMLA does not perform a full complex multiplication of its operands. Rather, it does what is effectively half of a complex multiplication. It multiplies both the real and imaginary parts of one operand, optionally rotated by some multiple of 90 degrees, by either the real or imaginary part (depending on the selected rotation) of the other operand.

    There are some notes on using this instruction in Section K13.1 of the Architecture Reference Manual, which I recommend reading.

    So if V0, V1, V2 each contain a double-precision complex number, then a full complex multiply-add V0 = V0 + (V1 * V2) is accomplished by

    FCMLA V0.2D, V1.2D, V2.2D, #0   // V0 += (Re(V1) * Re(V2), Im(V1) * Re(V2))
    FCMLA V0.2D, V1.2D, V2.2D, #90  // V0 += (-Im(V1) * Im(V2), Re(V1) * Im(V2))
    

    If you want a complex multiply-subtract V0 = V0 - (V1 * V2), then simply include an additional 180 degree rotation in each instruction. Recall that rotating a complex number by 180 degrees is equivalent to negating it.

    FCMLA V0.2D, V1.2D, V2.2D, #180 // V0 += (-Re(V1) * Re(V2), -Im(V1) * Re(V2))
    FCMLA V0.2D, V1.2D, V2.2D, #270 // V0 += (Im(V1) * Im(V2), -Re(V1) * Im(V2))
    

    So in essence, a pair of FCMLA instructions can perform either a multiply-add or a multiply-subtract, depending on the specified immediate rotation angle. There is no FCMLS because it would be redundant.