c++optimizationsquare-root

Fast inverse arbitrary power root algorithm implementation


Many sources indicate that well-known fast inverse square root algorithm can be generalized to calculation arbitrary power inverse root. Unfortunately I have not found such C++ implementation and I'm not so good at math to generalize this method by myself. Could you help me to do this or perhaps provide a ready-made solution? I think this will be useful to many, especially with good explanations.

This is the original algorithm and I do not quite understand what I need to change to get for example 1 /cbrt(x):

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // what the...? 
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//  y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

    return y;
}

Solution

  • The algorithm consists of two steps - a rough solution estimation and the solution improvement using several Newton method steps.

    Rough estimation

    The basic idea is to use relationship between float number logarithm log2(x) and its integer representation Ix:

    enter image description here

    (Image from https://en.wikipedia.org/wiki/Fast_inverse_square_root)

    Now use the well-known logarithm identity for root:

    .

    Combining the identities obtained earlier, we get:

    Substituting numerical values L * (B - s) = 0x3F7A3BEA, so

    Iy = 0x3F7A3BEA / c * (c + 1) - Ix / c;.

    For a simple float point number representation as an integer and back it is convenient to use union type:

       union 
       { 
          float f; // float representation
          uint32_t i; // integer representation
       } t;
    
       t.f = x;
       t.i = 0x3F7A3BEA / n * (n + 1) - t.i / n; // Work with integer representation
       float y = t.f; // back to float representation
    

    Note that for n=2 the expression is simplified to t.i = 0x5f3759df - t.i / 2; which is identical to original i = 0x5f3759df - ( i >> 1 );

    Newton's solution improvement

    Transform equality

    enter image description here

    into an equation that should be solved:

    Now construct Newton steps:

    Programmatically it looks like: y = y * (1 + n - x * pow(y,n)) / n;. As an initial y, we use the value obtained at Rough estimation step.

    Note for the particular case of square root (n = 2) we get y = y * (3 - x*y*y) / 2; which is identically to the original formula y = y * (threehalfs - (x2 * y * y));

    Final code as template function. Parameter N determines root power.

    template<unsigned N>
    float power(float x) {
       if (N % 2 == 0) return power<N / 2>(x * x);
       else if (N % 3 == 0) return power<N / 3>(x * x * x);
       return power<N - 1>(x) * x;
    };
    
    template<>
    float power<0>(float x){ return 1; }
    
    // fast_inv_nth_root<2>(x) - inverse square root 1/sqrt(x)
    // fast_inv_nth_root<3>(x) - inverse cube root 1/cbrt(x)
    
    template <unsigned n>
    float fast_inv_nth_root(float x)
    {
       union { float f; uint32_t i; } t = { x };
    
       // Approximate solution
       t.i = 0x3F7A3BEA / n * (n + 1) - t.i / n;
       float y = t.f;
    
       // Newton's steps. Copy for more accuracy.
       y = y * (n + 1 - x * power<n>(y)) / n;
       y = y * (n + 1 - x * power<n>(y)) / n;
       return y;
    }
    

    Testing

    Testing code:

    int main()
    {
       std::cout << "|x          ""|fast2      "" actual2    "
          "|fast3      "" actual3    "
          "|fast4      "" actual4    "
          "|fast5      "" actual5    ""|" << std::endl;
    
       for (float i = 0.00001; i < 10000; i *= 10)
          std::cout << std::setprecision(5) << std::fixed
          << std::scientific << '|'
          << i << '|'
          << fast_inv_nth_root<2>(i) << " " << 1 / sqrt(i) << "|"
          << fast_inv_nth_root<3>(i) << " " << 1 / cbrt(i) << "|"
          << fast_inv_nth_root<4>(i) << " " << pow(i, -0.25) << "|"
          << fast_inv_nth_root<5>(i) << " " << pow(i, -0.2) << "|"
          << std::endl;
    }
    

    Results:

    |x          |fast2       actual2    |fast3       actual3    |fast4       actual4    |fast5       actual5    |
    |1.00000e-05|3.16226e+02 3.16228e+02|4.64152e+01 4.64159e+01|1.77828e+01 1.77828e+01|9.99985e+00 1.00000e+01|
    |1.00000e-04|9.99996e+01 1.00000e+02|2.15441e+01 2.15443e+01|9.99991e+00 1.00000e+01|6.30949e+00 6.30957e+00|
    |1.00000e-03|3.16227e+01 3.16228e+01|1.00000e+01 1.00000e+01|5.62339e+00 5.62341e+00|3.98103e+00 3.98107e+00|
    |1.00000e-02|9.99995e+00 1.00000e+01|4.64159e+00 4.64159e+00|3.16225e+00 3.16228e+00|2.51185e+00 2.51189e+00|
    |1.00000e-01|3.16227e+00 3.16228e+00|2.15443e+00 2.15443e+00|1.77828e+00 1.77828e+00|1.58487e+00 1.58489e+00|
    |1.00000e+00|9.99996e-01 1.00000e+00|9.99994e-01 1.00000e+00|9.99991e-01 1.00000e+00|9.99987e-01 1.00000e+00|
    |1.00000e+01|3.16226e-01 3.16228e-01|4.64159e-01 4.64159e-01|5.62341e-01 5.62341e-01|6.30948e-01 6.30957e-01|
    |1.00000e+02|9.99997e-02 1.00000e-01|2.15443e-01 2.15443e-01|3.16223e-01 3.16228e-01|3.98102e-01 3.98107e-01|
    |1.00000e+03|3.16226e-02 3.16228e-02|1.00000e-01 1.00000e-01|1.77827e-01 1.77828e-01|2.51185e-01 2.51189e-01|
    |1.00000e+04|9.99996e-03 1.00000e-02|4.64155e-02 4.64159e-02|9.99995e-02 1.00000e-01|1.58487e-01 1.58489e-01|