Accessing column slices of 2D valarrays

Consider the following code snippet,

#include <iostream>
#include <valarray>

using namespace std;

std::ostream & operator<<(std::ostream & out, const std::valarray<int> inputVector);
typedef std::valarray<std::valarray<int> > val2d;

int main()
{
    val2d g(std::valarray<int>(10),4);

    for (uint32_t n=0; n<4; ++n){
        for (uint32_t m=0; m<10; ++m){
            g[n][m] = n*10 + m;
        }
    }  
   std::valarray<int> g_slice_rs = g[1][std::slice(0,10,1)];  // row slice
   //std::valarray<int> g_slice_cs = g[std::slice(0,1,3)][0];   // column slice (comment out)

   cout<<"row slice :: "<<g_slice_rs<<endl; 
   //cout<<"column slice :: "<<g_slice_cs<<endl; // (comment out)
   return 0;
}

std::ostream & operator<<(std::ostream & out, const std::valarray<int> inputVector)
{
  uint32_t vecLength = inputVector.size();
  out<<"[";
  for (uint32_t i=0; i<vecLength; ++i)
  {
    out <<inputVector[i]<<", ";
  }
  out<<"]"<<endl;
  return out;
}

Here I'm able to access the row slices, but not the column slices(as indicated in comments). Is there any workaround to access column slices? This thread does not provide the answer.

Solution

First off, you don't have a 2D valarray. You have a valarray of valarrays, a difference you should not ignore.

x = g[m][n];

only looks like an array-style access. It's really closer to

temp = g[m];
x = temp[n];

A valarray's datastore is a nice contiguous block of memory, but if you have an M by N structure, you have M+1 valarrays potentially scattered throughout memory. This can turn into a nightmare of performance-killing cache misses.

You are going to have to decide which is more important to be fast, row slicing or column slicing, because only one will be going with the flow of memory and the other require a cache-thrashing copy against the grain.

Currently

g[1][std::slice(0,10,1)];

works because it is slicing a contiguous block of memory, and

g[std::slice(0,1,3)][0]

fails because it must reach across M distinct valarrays to gather the slice and std::slice can't do that. You will have to manually copy the elements you want from each of the valarrays that make up the column. Sucks, huh?

So what do you do?

You fake it! Muhuhahahahahahahaha!

Don't make a valarray of valarrays. Make one big valarray of size MxN. So say goodbye to

std::valarray<std::valarray<int> > g(std::valarray<int>(10),4);

and hello to

std::valarray<int>(10*4);

Now you can take advantage of std::slice's stride parameter to grab every tenth element

std::slice(column_to_slice,4,10);

And as an added bonus you now have one contiguous block of memory so at least some of that cache-grinding abuse should be mitigated. You're still smurfed if the stride is too large.

I whole-heartedly recommend wrapping this in an object to make access and management easier. Something like this, except you use the valarray instead of the raw pointer.