c++vectormemory-managementstrict-aliasingtype-punning

Adversarial test against template generic STL vector pool


I have been coding C++ for many years. Most of my apps are built for solving problems in computational science/statistics/machine learning, so I have been extensively dealing with std::vectors storing large data. During these years, there's always an unconformable feeling like a splinter on the back of my head whenever I have the need for allocating temporary std::vectors inside a function. It incurs large overhead if the function is frequently called, and gets worse if the container is large or if the function runs in many threads.

My usual tactic is to write functions as functors, and set the temporary vectors as data members so they will be reused if the next call demands no greater capacity. If the memory resource is limited, I would need to manually scan the code, seek and elicit vectors of the same type elsewhere to reuse them.

For some larger apps, I often write a standalone struct VecPool that contains vector< vector<typeNeeded> >s. Temporary containers are fetched and returned during computation. In this case, I need to add new members to VecPool when new types show up, and it becomes messy or outright impossible if I want to templatize my class/function. The memory recycling efficiency is also low because containers of different types cannot be shared.

C++ standard pretty much prevents reusing a memory buffer initialized with a different type. For years I have been wanting to have a VecPool that recycles temporary std::vectors in the most parsimonious manner. Recently I finally made the determination to remove the splinter once for all.

I have the following demands:

  1. auto x = VecPool::give<T>(size1, ...) will send me a vector or a nested vector. The innermost data type is T. For example, auto x = VecPool::give<std::pair<double, std::set<int, float>>>(3, 2, 8, 5); should yield a vector<vector<vector<vector<std::pair<double, std::set<int, float>>>>>>. In total, x is a composition of 1 + 3 + 3 x 2 + 3 x 2 x 8 = 58 vectors of various types. Function VecPool::give should fetch 58 vectors from the pool, construct x and return it.

  2. VecPool::recall(x) will extract the 58 vectors in x, and push them back to the pool in the same order of them being pulled previously.

To achieve my goal, I used std::vector<std::vector<char>> as the pool container with a little hack. The implementation has minimal violation to the C++ standard. Here are some important considerations:

  1. Standard >= C++17.

  2. Memory alignment. This will not be an issue since the standard requires malloc return address that is at least sizeof(size_t) byte aligned.

  3. How std::vector is implemented. The pool constructor should static_assert that the layout of vector's header must be a 3-pointer. After the pointers are converted to integers i0, i1, i2, i1 - i0 must equal the vector's size times the byte size of the element type; i2 - i0 must equal the vector's capacity times the byte size of the element. If these requirements are not fulfilled, throw a runtime error.

  4. When a vector<char> in the pool is adjusted to contain T before being dispatched, we need to adjust the 3rd pointer in the vector header to ensure the byte capacity is a multiple of the byte size of the element.

The following is my implementation. Here is a copy.

#include <vector>
#include <cstring>
#include <stdlib.h>
#include <time.h>


template<typename DestPtr, typename SourPtr>
inline void mmcp(DestPtr dest, SourPtr src, std::size_t Nbyte)
{ 
  std::memcpy((void*)dest, (void*)src, Nbyte);
}


template<typename T>
struct isVector { constexpr bool operator()() { return false; } };


template<typename T, typename A>
struct isVector<std::vector<T, A>> 
{ 
  constexpr bool operator()() { return true; };
};


// =============================================================================
// VecPool is not thread-safe. Create a VecPool for each thread.
// =============================================================================
class VecPool
{
private:
  std::vector<std::vector<char>> X;
  
  
  // ===========================================================================
  // Convert vector<T> to vector<char> that will be throw in pool.
  // ===========================================================================
  template<typename T>
  std::vector<char> T2char(std::vector<T> &x)
  { 
    static_assert(sizeof(x) == sizeof(std::vector<char>));
    x.resize(0); // Call destructors on all elements but keep capacity.
    std::vector<char> rst;
    char mem[sizeof(x)]; // Start swapping.
    mmcp(mem,  &rst, sizeof(x));
    mmcp(&rst, &x,   sizeof(x));
    mmcp(&x,   mem,  sizeof(x));
    return rst;
  }
  
  
  // ===========================================================================
  // Convert vector<char> in pool to vector<T>.
  // ===========================================================================
  template<typename T>
  std::vector<T> char2T(std::vector<char> &x)
  {  
    static_assert(sizeof(x) == sizeof(std::vector<T>));
    std::vector<T> rst;
    std::size_t mem[3];
    mmcp(mem, &x, sizeof(x));
    // =========================================================================
    // Adjust the capacity pointer to ensure the capacity is a multiple 
    //   of sizeof(T):
    // =========================================================================
    mem[2] -= (mem[2] - mem[0]) % sizeof(T);
    mmcp(&x,  &rst, sizeof(x));
    mmcp(&rst, mem, sizeof(x));
    return rst;
  }
  
  
  // ===========================================================================
  // Dispatch core function.
  // ===========================================================================
  template<typename T>
  std::vector<T> giveCore(std::size_t size)
  {
    if (X.size() == 0) 
    {
      std::vector<T> rst(size + 1); // Always prefer a little bit larger capacity.
      rst.pop_back();
      return rst;
    }
    auto rst = char2T<T> (X.back());
    rst.resize(size + 1); // Always prefer a little bit larger capacity.
    rst.pop_back();
    X.pop_back();
    return rst;
  }
  
  
  // ===========================================================================
  // Recall the vector of arbitrary type. The function will first clear
  // the vector --- call desctructor on all elements, and then take the
  // container back to pool.
  // ===========================================================================
  template<typename T>
  void recallCore(std::vector<T> &x) // Invalidates everything related to x.
  { 
    x.resize(0); // Call destructor on all elements.
    X.emplace_back(std::vector<char>(0));
    T2char(x).swap(X.back());
  }
  
  
public:
  VecPool()
  {
    std::srand(std::time(nullptr));
    test0();
    test1();
    test2();
    std::vector<std::vector<char>>(0).swap(X);
  }
  
  
  template <typename T, typename... Args>
  auto give(std::size_t size, Args... restSizes)
  {
    if constexpr (sizeof...(restSizes) == 0) return giveCore<T> (size);
    else 
    { 
      auto rst = giveCore< decltype( give<T>(restSizes...) ) > (size);
      for (auto &x: rst) give<T>(restSizes...).swap(x);
      return rst;
    }
  }
  
  
  template <typename T> 
  void recall(std::vector<T> &x)
  {
    if constexpr (isVector<T>()())
    { 
      for (auto i = x.rbegin(); i != x.rend(); ++i) recall(*i);  
      recallCore(x);
    }
    else recallCore(x);
  }
  
  
  void test0(); // Test if std::vector header is implemented as 3-pointer.
  void test1(); // Test if vectors fetched from or recalled to VecPool are handled correctly.
  void test2(); // Test if nested vectors can be handled correctly.
};


void VecPool::test0()
{
  static_assert( sizeof(std::size_t) == sizeof(void*) );
  
  
  // ===========================================================================
  // Test if std::vector is implemented in the way that can be exploited.
  // ===========================================================================
  unsigned fullSize = std::rand() % 31 + 17;
  unsigned subSize = fullSize - std::rand() % 13;
  
  
  typedef std::tuple<char, char, char> tupe; // Arbitrarily selected type.
  std::vector<tupe> a(fullSize); // Just for creating a vector of an arbitrary type.
  a.resize(subSize); // Downsize the vector.
  
  
  // ===========================================================================
  // To make VecPool work, vector header must contain exactly 3 pointers, 
  //   e.g. p1, p2, p3.
  // ===========================================================================
  static_assert( sizeof(a) == sizeof(std::size_t) * 3 );
  auto ptr = (std::size_t*)(&a); // Read but won't write.
  bool sizePointerCool     = ptr[0] + sizeof(tupe) * subSize  == ptr[1];
  bool capacityPointerCool = ptr[0] + sizeof(tupe) * fullSize == ptr[2];
  
  
  if (!sizePointerCool or !capacityPointerCool) throw std::runtime_error(
      "VecPool initialization: Layout of std::vector header is unsupported.\n");
}




// ===========================================================================
// Test 1 to run during initialization. Check memory addresses and if the 
//   container dispatched from VecPool can call push()/emplace_back().
// ===========================================================================
void VecPool::test1()
{
  std::size_t asize = std::rand() % 7 + 3;
  auto a = give<float> (asize);
  a.resize(asize);
  void *aptr = a.data();
  
  
  int bsize = std::rand() % 5 + 3;
  auto b = give<std::tuple<short, short, short> >(bsize);
  void *bptr = b.data();
  
  
  recall(b);
  recall(a);
  
  
  if ( (void*)(X.back().data()) != aptr ) throw std::runtime_error(
    "VecPool initialization: Last in pool is a different container.\n");
  
  
  if ( (void*)(X[X.size() - 2].data()) != bptr ) throw std::runtime_error(
    "VecPool initiailization: Second last in pool is a different container.\n");
  
  
  typedef std::tuple<char, char, char, char, char, char, char> tupe;
  int lastContainerByteCapa = X.back().capacity();
  int pushTimes = lastContainerByteCapa / sizeof(tupe) + 5;
  auto c = give<tupe>(0);
  
  
  // =========================================================================
  // Test emplace_back() beyond the vector's capacity.
  // =========================================================================
  for (int i = 0; i < pushTimes; ++i) c.emplace_back(tupe());
  int dsize = c.capacity() - c.size() + 3;
  
  
  auto d = give<double>(dsize);
  auto dptr = d.data() + 1;
  recall(d);
  if (std::size_t(X.back().data()) + sizeof(double) != std::size_t(dptr) ) 
    throw std::runtime_error(
        "VecPool initiailization: Last in pool is a different container -- 2.\n");
}


// Find address of the first element in a vector. Used in test2().
template <typename T>
void FindAddrsOfAllVectorsInNestedVector(T &x, std::vector<std::size_t> &addrs)
{
  if constexpr (!isVector<T>()()) return;
  else
  {
    static_assert( sizeof(std::size_t) == sizeof(x.data()) );
    addrs.emplace_back(std::size_t(x.data()));
    for (auto &u: x) FindAddrsOfAllVectorsInNestedVector(u, addrs);
  }
}


// ===========================================================================
// Test 2 to run during initialization. Test if VecPool correctly deals with
//   nested vectors.
// ===========================================================================
void VecPool::test2()
{
  int a[7];
  for (int i = 0; i < 7; ++i) a[i] = rand() % 3 + 1;
  auto x = give<float> ( // x is a 7-d vector.
    a[0], a[1], a[2], a[3], a[4], a[5], a[6]);
  std::vector<std::size_t> addresses;
  FindAddrsOfAllVectorsInNestedVector(x, addresses);
  recall(x);
  for (int i = X.size() - 1, j = 0; i >= 0; --i, ++j)
  { 
    auto addr = std::size_t(X[i].data());
    if (addresses[j] != addr) throw std::runtime_error(
      "VecPool initialization: Nested vector test failed.\n");
  }
} 


// =============================================================================
// Usage pattern goes like this:
// =============================================================================
// VecPool vp;
// auto a = vp.give<int> (3);
// auto b = vp.give<double>(5, 7);
// auto c = vp.give<std::tuple<short, short, short> >(2);
// 
// ......
// 
// vp.recall(c); // Recalls are not necessary but good to do.
// vp.recall(b);
// vp.recall(a);
// =============================================================================
// It is strongly recommended that the first dispatched gets recalled last,
//   so the next code run will always utilize the same containers.
// =============================================================================


// ===========================================================================
// Code pattern for creating a vector of vectors of different sizes.
// ===========================================================================
// auto v = vp.give< std::vector<std::vector<int>> > (3); // A vector of vector of vectors.
// for (int i = 0; i < 3; ++i)
// {
//   vp.give<std::vector<int>>(i * 2 + 1).swap(v[i]);
//   for (int j = 0, jend = v[i].size(); j < jend; ++j)
//     vp.give<int>(j + 5).swap(v[i][j]);
// }

I have run extensive tests and have plugged it in my projects. There has been no problem so far.

Besides the obvious problem of the code, can anyone write a standard-conformant test case that will break it, i.e. make it go wrong? If you have better ideas of achieving what I want, I would be more than happy to hear about them!

Replies to some comments:

My goal is not just about reusing memory, but really reusing vectors which have so many member functions that I like, e.g. emplace_back(). Of course it is not the most efficient function if you did not reserve space, but there are many situations where you cannot guess a sufficient capacity.

Nested vectors are indeed not the most efficient, but there are many occasions when using data cube is a massive waste of memory or downright infeasible.


Solution

  • Your code is super dangerous because you're memcpying around internals of std::vector, which is... not safe. But your real goal appears to be simply to reuse memory rather than reusing vectors. C++ actually has built-in facilities for this: customizable allocators. What you're really wanting is a pool allocator, such as those found in boost.

    If, however, you want to make your code work, then what you want is for something like this to work:

    VecPool vp;
    
    std::vector<int, VecPoolAllocator<int>> a = vp.get<int>(3);
    

    And then the std::vector will use the vp for allocating and deallocating memory. You'll want to make a VecPool object that can maintain a list of freed memory and attempt to reuse that memory for subsequent memory allocations. Something like this:

    
    class VecPool {
    private:
        std::vector<std::pair<std::size_t,std::unique_ptr<char[]>>> X;
    public:
        char* allocate(std::size_t size) {
            for(auto it=X.begin(); it!=X.end(); it++) {
                if (it->first >= size) {
                    char* memory = it->second.release();
                    X.erase(it);
                    return memory;
                }
            }
            return new char[size];
        }
        void deallocate(char* ptr, std::size_t size) noexcept {
            // If this fails to allocate, thats undefined behavior! 
            X.emplace_back(size, std::unique_ptr<char[]>{ptr});
        }
    }
    

    You'll also want an allocator object that tells the std::vector how to interact with your VecPool:

    template<class T>
    class VecPoolAlloc {
        VecPool* base_;
    public:
        using value_type = T;
        using is_alwyas_equal = std::false_type;
        using propagate_on_container_copy_assignment = std::true_type;
        using propagate_on_container_move_assignment = std::true_type;
        using propagate_on_container_swap = std::true_type;
        
        VecPoolAlloc(const VecPool& base) noexcept : base_(const_cast<VecPool*>(&base)) {}
        template<class U> VecPoolAlloc(const VecPoolAlloc<U>& rhs) noexcept : base_(rhs.base_) {}
        template<class U> VecPoolAlloc& operator=(const VecPoolAlloc<U>& rhs) noexcept {base_=rhs.base_; return *this;}
        T* allocate(std::size_t size) {return reinterpret_cast<T*>(base_->allocate(size*sizeof(T)));}
        void deallocate(T* ptr, std::size_t size) noexcept {base_->deallocate(reinterpret_cast<char*>(ptr), size*sizeof(T));}
        template<class U> bool operator==(const VecPoolAlloc<U>&rhs) noexcept {return base_==rhs.base_;}
    };
    

    And that's the basic functionality. You just have to make sure to always pass the vp as the last parameter to the std::vector constructors.


    However, the need to write std::vector<std::vector<int, VecPoolAllocator<int>>, VecPoolAllocator<std::vector<int, VecPoolAllocator<int>>>> and similar is obnoxious for parameters and return types and such, so since you mention that nested vectors is common for your use case, you might as well make helper methods for those:

        template<class T, std::size_t depth> struct pool_vec_finder;
        template<class T> struct pool_vec_finder<T,1> {
            using value_type = T;
            using type = std::vector<value_type, VecPoolAlloc<value_type>>;
        };
        template<class T, std::size_t depth> struct pool_vec_finder {
            using value_type = pool_vec_finder<T, depth-1>::type; 
            using type = std::vector<value_type, VecPoolAlloc<value_type>>;
        };
        template<class T, std::size_t depth=1>
        using pool_vector = pool_vec_finder<T, depth>::type;
    

    And then you can pass around pool_vector<int,2> instead, which is far easier to type and understand.


    Oh, and you wanted give<T>(size_t...) methods as well? Done.

        template<class T>
        pool_vector<T> give(std::size_t count) {
            return pool_vector<T>(count, *this);
        }
        template<class T, class...sizets>
        pool_vector<T, sizeof...(sizets)+1> give(std::size_t count, sizets...sizes)
        {
            pool_vector<T, sizeof...(sizets)+1> temp(count, give<T>(sizes...), *this);
            return temp;
        }
    

    http://coliru.stacked-crooked.com/a/ccfc359a1819872f


    As a final note, since this is an abstract allocator, you can actually use it with other structures too.

    VecPool vp;
    std::list<int, VecPoolAllocator<int>> myList(vp);
    std::dequeue<int, VecPoolAllocator<int>> myDeque(vp);
    std::set<int, std::less<int>, VecPoolAllocator<int>> mySet(vp);
    std::map<int, char, std::less<int>, VecPoolAllocator<std::pair<const int, char>>> myMap(vp);
    

    And it'll simply reuse memory allocations between all of these, no problem.