Arm Architecture Reference Manual for A-profile architecture (emphasis added):
FPHP, bits [27:24]
0b0011 As for 0b0010, and adds support for half-precision floating-point arithmetic.
A simple question: where is to find a list of ARM instructions implementing half-precision floating-point arithmetic?
UPD. Per Clang for Arm (armclang) documentation:
The
__fp16
data type is not an arithmetic data type. The__fp16
data type is for storage and conversion only.
The_Float16
data type is an arithmetic data type. Operations on_Float16
values use half-precision arithmetic.
Hence, when using Clang for Arm I need to use _Float16
(not __fp16
).
Per GCC for Arm documentation:
The __fp16 type may only be used as an argument to intrinsics defined in <arm_fp16.h>, or as a storage format. For purposes of arithmetic and other operations,
__fp16
values in C or C++ expressions are automatically promoted tofloat
. It is recommended that portable code use the_Float16
type defined by ISO/IEC TS 18661-3:2015.
Hence, when using GCC for Arm I need to use _Float16
(not __fp16
).
However, then why in this example from Nate Eldredge GCC for Arm generates vmul.f16
instead of half<->float conversions followed by vmul.f32
? Per quote above __fp16
values in C or C++ expressions are automatically promoted to float
. Why they are not promoted to float
in this case?
It's not really a separate list. When this feature is present, basically all the floating-point instructions that already exist gain support for half-precision.
In AArch64 state, you use the same floating-point instruction mnemonics, using h
registers or vector element sizes to specify a half-precision operation. For example, fadd h0, h1, h2
does a half-precision floating-point add (scalar), and fadd v0.8h, v1.8h, v2.8h
does eight such adds in parallel (vector).
In AArch32 state, you use a .f16
suffix on the mnemonic. So vadd.f16 s0, s1, s2
(in 32-bit state the h
register names are not used, and the result is zero-extended into the 32-bit s
register). Or (untested) vadd.f16 d0, d1, d2
for a four-element vector add, or vadd.f16 q0, q2, q4
for eight elements.
If you really want a list of all the instruction forms added by the FP16 feature, you can skim the tables in the Instruction Set Encoding chapters of the Architecture Reference Manual and look for FP16 in the Feature column. Or search for (FEAT_FP16)
in the instruction descriptions chapter.