c++cudamemory-pool

How to create globally-accessible variable on cuda?


This is a pretty complicated question, and I'm not a native English speaker, so I'll thanks if you are patient enough to read my question.

As Cuda is actually operating on two computers, it is invalid to point to a host's memory when you are on device, which means you cannot copy structs (or objects) to device if they have pointer members.

I tried to make the following system to solve this issue:

  1. use integers instead of pointers. The integer is an offset inside a memory pool. The integer is wrapped in a class (overloads "->" and "*") to make it looks like a pointer.
  2. the memory pool object manages a continuous array of objects, which can be easily transferred to Cuda device. The pool's content synchronizes between host and device, so an integer offset would have same meaning on both two sides.

To conveniently use the offset, it should be wrapped. In host side, the wrapper looks like this:

template<typename T>
class MemPoolPointer {
public:
    inline T* operator -> () const
    {
        return &( MemPool<T>::get_instance.get_object(_p) );
    }
    uint64_t _p;
}

We can see, the pointer class requires globally access of the memory pool. This is usually implemented by make the memory pool to be singleton. However, Cuda do not allow static members, and it limits __device__ variables to be file scope. How can I workaround these limitations? Or I should try OpenCL?


Solution

  • The OP was able to solve this by wrapping a global scope __device__ variable using a static class method like this:

    class FooBar;
    __device__ FooBar* FOOBAR_DEVICE_POOL;
    class FooBar
    {
        __device__ static FooBar& DEVICE_GET(uint64_t p);
    }
    
    template<typename T>
    class MemPoolPointer {
    public:
        inline T* operator -> () const
        {
    #ifdef __CUDA_ARCH__
            return &( T::DEVICE_GET(_p) );
    #else
            return &( MemPool<T>::get_instance.get_object(_p) );
    #endif
        }
        uint64_t _p;
    }
    

    [this answer added as a community wiki entry to get the question off the unanswered queue for the CUDA tag]