c++language-lawyerstd

Is there inlinable function for pow2?


Does C++ std have some inlinable template for pow2 (square) which will work efficiently for all POD types?

I see that we have std::pow, std::powf, std::powl, but:

  1. I can't be quite sure that every compiler will inline them. More that that I am sure that in Debug mode most of them won't. (MSVC won't even in Release).
  2. These tree works with float, double and long double, while sometimes I need it for int and std::size_t.

(I have these squaring in time-critical sections of code (calculating distances between many objects, etc.) and optimal generated code is important. At the same time, arguments sometimes are long in writing and I don't wont to duplicate them in-place.)

Of course, I can write:

template <typename T>
T pow2(T v) {
    return v*v;
}

But:

  1. I don't want to reinvent a wheel even for such tiny pieces of code if something already exists and recommended for use.
  2. I am not sure that it will be most effective code in all cases.

And, at the end of the day, if we don't have this in the std, why and if "because pow2 is a very special case", why do we have std::pow only for floating-point types? Am I the only person who needs pow2 for integral types?


Solution

  • Whether or not the standard library implements a function in such a way that it can be inlined (whether with or without LTO) is completely a decision of the implementation.

    Nothing in the standard is concerned with implementation details such as optimizations and whether or not they can be applied. The standard only describes observable behavior of programs. In particular there are no performance guarantees except for asymptotic complexity of certain library functions in terms of input length.

    Nothing prevents the implementation to define std::pow as inline in a header, nor does an implementation need to define the different overloads mentioned in the standard as such. The implementation could for example define std::pow as a single function template (in a header or via explicit specialization somewhere else).

    But none of this is really relevant for what you want. std::pow operates on two floating point arguments. It must be written to support the power calculation for all floating point exponents. An algorithm for general powers will be much much slower than simply multiplying two values as required for a square function.

    Of course, an implementation could test for the argument to be exactly 2 and branch to a simpler implementation for that special case, but I doubt they actually do given the intended use of the power function and even if they did, that is still an extra branch that is costly relative to the single multiplication.

    In any case, this implementation detail of std::pow is much more important than whether the function is implemented in such a way that it can be inlined.


    There is no specific square function in the standard library. You will need to write your own, for example as you have shown. This is true if performance is even remotely important, but also if accuracy of the result is important, because std::pow (in contrast to simple multiplication) does not make any guarantees on the accuracy of the result in general.


    My guess as to why std::pow is in the library, but no function for squaring or general integer powers is that a correct algorithm for std::pow is much more difficult to implement and depends stronger on architecture. A decent squaring function is easy to write by hand.

    Also, a power function operating purely on integer types isn't all that useful, because the result will only be representable for small range of input values. Already power-of-two can easily overflow and often requires specific handling of these cases.


    Am I the only person who needs pow2 for integral types?

    It is really rare that I come across a situation where this is needed outside of compile-time constants and in these situations, simply writing a multiplication is usually fine.

    For floating point operations this is a bit different because longer mathematical expressions containing multiple squaring operations sometimes make sense. Sometimes it doesn't make sense to give intermediate results names so that * can be used efficiently instead and sometimes it is important to not store the result in a variable for extended precision or contracting of intermediate results. (Although the latter requires a macro instead of a function.)