c++templates c++14 template-meta-programming valarray

C++ class with nested expression templates

I want to define a class, called Nested here, that will contains two or more (one here) data members that support arithmetic operations using expression templates, for example an std::valarray. For this class itself, I am defining its own expression templates and I want to "forward" the arithmetic operations down to the members.

A minimal (non)working example is given below:

#include <iostream>
#include <valarray>

template <typename E>
struct NestedExpr {
    operator const E& () const {
        return *static_cast<const E*>(this);
    }
};

template <typename A>
class Nested : public NestedExpr <Nested<A>>{
private:
    A a;
public:
    Nested(const A& _a) : a(_a) {}

    template <typename E>
    inline Nested<A>& operator = (const NestedExpr<E>& _expr) {
        const E& expr(_expr);
        a = expr.get_a();
        return *this;
    }

    inline       A& get_a()       { return a; }
    inline const A& get_a() const { return a; }
};

// ================================================================= //
template <typename ARG, typename S>
class NestedMul : public NestedExpr<NestedMul<ARG, S>> {
public:
    const ARG& arg;
    const S      s;
    NestedMul(const ARG& _arg, S _s) : arg(_arg), s(_s) {}
    inline auto get_a() const { return arg.get_a() * s; };
};

template< typename ARG, typename S>
inline NestedMul<ARG, S> operator * (S s, const NestedExpr<ARG>& arg) {
    return {arg, s};
}

// ================================================================= //
template <typename ARG1, typename ARG2>
class NestedAdd : public NestedExpr<NestedAdd<ARG1, ARG2>> {
public:
    const ARG1& arg1;
    const ARG2& arg2;
    NestedAdd(const ARG1& _arg1, const ARG2& _arg2)
        : arg1(_arg1), arg2(_arg2) {}
    inline auto get_a() const { return arg1.get_a() + arg2.get_a(); };
};

template<typename ARG1, typename ARG2>
inline NestedAdd<ARG1, ARG2> 
operator + (const NestedExpr<ARG1>& arg1, const NestedExpr<ARG2>& arg2) {
    return {arg1, arg2};
}

int main () {
    std::valarray<double> x1 = {4.0};
    std::valarray<double> x2 = {3.0};
    std::valarray<double> x3 = {0.0};
    std::valarray<double> x4 = {0.0};

    auto a = Nested<std::valarray<double>>(x1);
    auto b = Nested<std::valarray<double>>(x2);
    auto c = Nested<std::valarray<double>>(x3);

    // this returns 21
    c  = 2*a  + 3*b;
    std::cout << c.get_a()[0] << std::endl;

    // works as expected, returns 17
    x4 = 2*x1 + 3*x2;
    std::cout <<        x4[0] << std::endl;
}

The output of this program is

21
17

i.e. forwarding the expression down to the member does not seem to provide the expected result obtained directly from using the valarrays.

Any help here is appreciated.

Solution

In the below function definition:

inline auto get_a() const { return arg.get_a() * s; };

your expected behavior is that auto deduces std::valarray<double>, that is, the result type of a multiplication of std::valarray<double> and int which is a new object that already stores values multiplied by the integer.

This is how operator* is defined [valarray.binary]/p2:

template <class T>
valarray<T> operator*(const valarray<T>&,
                      const typename valarray<T>::value_type&);

However, there's the following paragraph in the standard [valarray.syn]/p3:

Any function returning a valarray<T> is permitted to return an object of another type, provided all the const member functions of valarray<T> are also applicable to this type. This return type shall not add more than two levels of template nesting over the most deeply nested argument type.

This type must be convertible to std::valarray<double>, but itself, for optimization purposes, may not represent the actual result before that conversion happens.

That is, here's the actual type deduced for auto by GCC:

std::_Expr<std::__detail::_BinClos<std::__multiplies
                                 , std::_ValArray
                                 , std::_Constant, double, double>, double>

and here's what Clang uses:

std::__1::__val_expr<std::__1::_BinaryOp<std::__1::multiplies<double>, 
                     std::__1::valarray<double>, std::__1::__scalar_expr<double> > >

In other words, you are returning by value an object which probably defers the actual computations. In order to do so, those intermediate objects need to store somehow the deferred subexpressions.

Inspecting GCC libstdc++'s implementation, one can find the following representation:

template <class _Oper, class _FirstArg, class _SecondArg>
class _BinBase
{
public:
    typedef typename _FirstArg::value_type _Vt;
    typedef typename __fun<_Oper, _Vt>::result_type value_type;

    _BinBase(const _FirstArg& __e1, const _SecondArg& __e2)
    : _M_expr1(__e1), _M_expr2(__e2) {}

    // [...]

private:
    const _FirstArg& _M_expr1;
    const _SecondArg& _M_expr2;
};

Note that subexpressions are stored as references. This means that in the definition of get_a():

return arg1.get_a() + arg2.get_a();

_M_expr1 and _M_expr2 are bound to temporary objects:

arg1.get_a()
arg2.get_a()

i.e., intermediate objects which are the results of multiplications, whose lifetime ends as soon as NextedAdd::get_a() exits, leading to undefined behavior when the result is eventually computed, in particular, when the implementation attempts to access each individual element of that intermediate subexpressions:

value_type operator[](size_t __i) const
{
    return _Oper()(_M_expr1[__i], _M_expr2[__i]);
}

A quick solution would be to use the following return type:

std::decay_t<decltype(arg.get_a())> get_a() const { return arg.get_a() * s; }

This will recursively ensure that the final result type of any operation will be whatever the original type T in Nested<T> was, i.e., std::valarray<double>.

DEMO