Many implementation of the library goes deep down to FPATAN instuction for all arc-functions. How is FPATAN implemented? Assuming that we have 1 bit sign, M bits mantissa and N bits exponent, what is the algorithm to get the arctangent of this number? There should be such algorithm, since the FPU does it.
Trigonometric functions do have pretty ugly implementations that are hacky and do lots of bit fiddling. I think it will be pretty hard to find someone here that is able to explain an algorithm that is actually used.
Here is an atan2 implementation: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/e_atan2.c;h=a287ca6656b210c77367eec3c46d72f18476d61d;hb=HEAD
Edit: Actually I found this one: http://www.netlib.org/fdlibm/e_atan2.c which is a lot easier to follow, but probably slower because of that (?).
The FPU does all this in some circuits so the CPU doesn't have to do all this work.