I'm looking for an equivalent to x86/64's FTZ/DAZ instructions found in <immintrin.h>, but for M1/M2/M3. Also, is it safe to assume that "apple silicon" equals ARM?
I am in the process of porting a realtime audio plugin (VST3/CLAP) from x64 Windows to MacOS on apple silicon hardware. At least on x64, it is important for realtime audio code, that denormal numbers (also known as subnormal numbers) are treated as zero by the hardware since these very-small-numbers are otherwise handled in software and that causes a real performance hit.
Now, as denormal numbers are part of the IEEE floating point standard, and they are explicitly mentioned over here https://developer.arm.com/documentation/ddi0403/d/Application-Level-Architecture/Application-Level-Programmers--Model/The-optional-Floating-point-extension/Floating-point-data-types-and-arithmetic?lang=en#BEICCFII, I believe there must be an equivalent to intel's _MM_SET_FLUSH_ZERO_MODE and _MM_SET_DENORMALS_ZERO_MODE macros. Of course, I might be mistaken, or maybe the hardware flushes to zero by default (it's not really clear to me from the ARM document), in which case, I'd like to know that, too.
Include <fenv.h>
and use:
int r = fesetenv(FE_DFL_DISABLE_DENORMS_ENV);
// check r == 0
From man fegetenv
:
The fesetenv() function attempts to establish the floating-point environment
represented by the object pointed to by envp. This object shall have been
set by a call to fegetenv() or feholdexcept(), or be equal to a floating-point
environment macro defined in <fenv.h>.
And from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/fenv.h
(gated behind an __arm64__
ifdef check):
/* FE_DFL_DISABLE_DENORMS_ENV
A pointer to a fenv_t object with the default floating-point state modified
to set the FZ (flush to zero) bit in the FPCR. When using this environment
denormals encountered by floating-point calculations will be treated as
zero. Denormal results of floating-point operations will also be treated
as zero. This calculation mode is not IEEE-754 compliant, but it may
prevent lengthy stalls that occur in code that encounters denormals. It is
suggested that you do not use this mode unless you have established that
denormals are the source of measurable performance problems.
Note that the math library, and other system libraries, are not guaranteed
to do the right thing if called in this mode. Edge cases may be incorrect.
Use at your own risk. */
extern const fenv_t _FE_DFL_DISABLE_DENORMS_ENV;
#define FE_DFL_DISABLE_DENORMS_ENV &_FE_DFL_DISABLE_DENORMS_ENV
Test code:
#include <fenv.h>
#include <stdint.h>
#include <stdio.h>
typedef volatile union
{
float f;
uint32_t u;
} num_debug_t;
int main(void)
{
num_debug_t n = { .u = 0x00800001 };
printf("Hex value: 0x%08x\n", n.u);
printf("Float value: %e\n", n.f);
printf("===== Normalised =====\n");
num_debug_t d = { .f = n.f / 2.0f };
printf("Division result hex: 0x%08x\n", d.u);
printf("Division result float: %e\n", d.f);
int r = fesetenv(FE_DFL_DISABLE_DENORMS_ENV);
if(r != 0)
{
fprintf(stderr, "fesetenv returned %d\n", r);
return -1;
}
printf("===== Denormalised =====\n");
d.f = n.f / 2.0f;
printf("Division result hex: 0x%08x\n", d.u);
printf("Division result float: %e\n", d.f);
return 0;
}
Output:
Hex value: 0x00800001
Float value: 1.175494e-38
===== Normalised =====
Division result hex: 0x00400000
Division result float: 5.877472e-39
===== Denormalised =====
Division result hex: 0x00000000
Division result float: 0.000000e+00
Under the hood, this simply sets bit 24 (FZ) in the FPCR
system register.