I'm trying to cram a lot of code into a reasonably small ARM microcontroller. I've done a massive amount of work on size optimisation already, and I'm down to the point where I need double arithmetic, but __aeabi_ddiv
, __aeabi_dadd
and __aeabi_dsub
are some of the biggest functions on the whole device.
Both __aeabi_dadd
and __aeabi_dsub
are ~1700 bytes each, despite doing basically the same job (the very top bit of doubles is the sign bit). Neither function references the other one.
Realistically all I need to do is replace __aeabi_dsub
with:
double __aeabi_dsub(double a, double b) {
// flip top bit of 64 bit number (the sign bit)
((uint32_t*)&b)[1] ^= 0x80000000; // assume little endian
return a + b;
}
and I'd save ~1700 bytes - so flipping the sign of the second argument, then adding them using __aeabi_dadd
.
I'm aware that this may not be 100% compatible with the IEEE spec, but on this platform I'm ok with that in order to save > 1% of my available flash.
My problem is that when I add that function, the linker complains with undefined reference to __aeabi_dsub
- which seems strange given that it's the act of defining it that causes the error.
This appears to be related to link time optimisation (-flto
) - turning it off means it all works perfectly, however it adds 8k to the firmware size to it no longer fits in available flash!
So what do I need to do to be able to replace the built-in function __aeabi_dsub
when link time optimisation is active?
thanks!
The solution for me (as suggested by @artless-noise) was to use the -ffreestanding
compiler flag. GCC has this to say about it:
Assert that compilation targets a freestanding environment... A freestanding environment is one in which the standard library may not exist, and program startup may not necessarily be at main. The most obvious example is an OS kernel.
So it seems to make a lot of sense for an embedded environment anyway...
This added ~250 bytes to the firmware size (about 0.1%) because I guess it stopped the compiler taking advantage of some assumptions about built-in operators, however it did allow me to add my own __aeabi_dsub
implementation, which saved 1680 bytes in total.