c++templatesexpression-templates

Why haven't modern compilers removed the need for expression templates?


The standard pitch for expression templates in C++ is that they increase efficiency by removing unnecessary temporary objects. Why can't C++ compilers already remove these unnecessary temporary objects?


This is a question that I think I already know the answer to but I want to confirm since I couldn't find a low-level answer online.

Expression templates essentially allow/force an extreme degree of inlining. However, even with inlining, compilers cannot optimize out calls to operator new and operator delete because they treat those calls as opaque since those calls can be overridden in other translation units. Expression templates completely remove those calls for intermediate objects.

These superfluous calls to operator new and operator delete can be seen in a simple example where we only copy:

#include <array>
#include <vector>

std::vector<int> foo(std::vector<int> x)
{
    std::vector<int> y{x};
    std::vector<int> z{y};
    return z;
}

std::array<int, 3> bar(std::array<int, 3> x)
{
    std::array<int, 3> y{x};
    std::array<int, 3> z{y};
    return z;
}

In the generated code, we see that foo() compiles to a relatively lengthy function with two calls to operator new and one call to operator delete while bar() compiles to only a transfer of registers and doesn't do any unnecessary copying.

Is this analysis correct?

Could any C++ compiler legally elide the copies in foo()?


Solution

  • However, even with inlining, compilers cannot optimize out calls to operator new and operator delete because they treat those calls as opaque since those calls can be overridden in other translation units.

    since c++14, this is no more true, allocation calls can be optimized-out/reused under certain conditions:

    [expr.new#10] An implementation is allowed to omit a call to a replaceable global allocation function. When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.[conditions follows]

    So foo() may be legally optimized to something equivalent to bar() nowadays ...


    Expression templates essentially allow/force an extreme degree of inlining

    IMO the point of expression templates is not much about inlining per se, it's rather exploiting the symmetries of the type system of the domain specific language the expression models.

    For example, when you multiply three, say, hermitian matrices, an expression template can use a space-time optimized algorithm exploiting the fact that the product is associative and that hermitian matrices are adjoint-symmetric, resulting in a reduction of total operation count (and possibly even better accuracy). And all this, occurs at compile time.

    Conversely, a compiler cannot know what an hermitian matrix is, it's constrained evaluating the expression the brute way (according to your implementation floating point semantics).