Interpreting AMD RDNA3 instruction names

I am trying to analyze my OpenCL kernel as compiled for an RDNA3 AMD GPU.

When I load my OpenCL kernel in the analyzer, it displays the assembly instruction for it in gfx1102 (RDNA3) assembly.

So far, so good.

I have difficulty interpreting the instruction names, though. I can look them up in the ISA documentation, but often, the full instruction name is not listed.

In my kernel's inner loop, I do multiply-adds on 16 bit floating point values.

I see this translated into:

v_fmac_f16_e32    v?, v?, v?

Which seems appropriate, as I understand that the 'v' stands for vector, fmac for fused-multiply-add and f16 for the 16-bit float arguments.

But the document does not describe the _e32 suffix.

What is the meaning of the _e32 suffic in RDNA3 assembly?

Solution

I think the ..._e32 suffix means the instruction is encoded as 32b. For instance, many instructions that are normally encoded as 64b (..._e64) also have a more compact encoding if the controls and inputs are the common cases. Instruction decoders expand the equivalent compact encodings with default values to the larger. In other words you could probably replace any .._e32 op with a similar op but with an .._e64 suffix and the program would be semantically (but would encode a little larger). You could confirm my guess by testing this.

I noticed this difference by manual observation and knowing of the gimmick from other architectures. Look at the decoded bits from the disassembler (also provided by the disassembler given the right option) along with the output, for instance.

v_add_co_ci_u32_e32 v5, vcc_lo, s3, v1, vcc_lo  // 000000002158: 500A0203
                                                                 ^^^^^^^^ 32b
...
v_mul_f32_e64 v7, v3, -s1                       // 000000002198: D5080007 40000303
                                                                 ^^^^^^^^ ^^^^^^^^ 64b

Some instruction lack any _eXX suffix. My guess is that those encodings are fixed size and lack ambiguity around this. E.g. HW and SW agree that such ops only have a single encoding size.

s_load_dword s5, s[6:7], 0x30                   // 00000000240C: F4000143 FA000030

Dig around in the instruction formats to see if there something, but I would not be surprised if the instruction manual omits this.