opencvcuda

how does cv::cuda::GpuMat turn into cv::cuda::PtrStepSz when passed to a kernel?


I like how GpuMat can be passed into a kernel as PtrStepSz. I want to emulate the behavior in my own custom container, But I dont understand whats actually going on. How is GpuMat able to be accessed through PtrStepSz in the kernel? And how does PtrStepSz have members like .rows and .cols if CUDA cant use classes? What actually is PtrStepSz? Ive been studying the source code but I'm having trouble contextualizing it.


Solution

  • The reason that GpuMat can be converted to PtrStepSz is that the GpuMat class has an overloaded typecast operator which allows extraction of core members of GpuMat ( i.e. rows, columns, step and data ). It can be seen in the GpuMat source code linked above. The said operator is declared as a member of GpuMat class as follows:

    template <typename _Tp> operator PtrStepSz<_Tp>() const;
    

    Coming to the second question, CUDA does allow construction of objects inside the kernel if the constructor and destructor are decorated with __device__ qualifier. So the assumption that CUDA cannot use classes is incorrect. In the source code of opencv PtrStepSz is defined as follows in the file cuda_types.hpp:

    template <typename T> struct PtrStepSz : public PtrStep<T>
    {
       __CV_CUDA_HOST_DEVICE__ PtrStepSz() : cols(0), rows(0) {}
       __CV_CUDA_HOST_DEVICE__ PtrStepSz(int rows_, int cols_, T* data_, size_t step_)
                    : PtrStep<T>(data_, step_), cols(cols_), rows(rows_) {}
    
       template <typename U>
       explicit PtrStepSz(const PtrStepSz<U>& d) : PtrStep<T>((T*)d.data, d.step), cols(d.cols), rows(d.rows){}
    
       int cols;
       int rows;
    
       CV_NODISCARD_STD __CV_CUDA_HOST_DEVICE__ Size size() const { return {cols, rows}; }
       CV_NODISCARD_STD __CV_CUDA_HOST_DEVICE__ T& operator ()(const Point &pos)       { return (*this)(pos.y, pos.x); }
       CV_NODISCARD_STD __CV_CUDA_HOST_DEVICE__ const T& operator ()(const Point &pos) const { return (*this)(pos.y, pos.x); }
       using PtrStep<T>::operator();
    };
    

    It is just a soft wrapper to encapsulate image information as mentioned in the comments in cuda_types.hpp. See how the constructor is decorated with __host__ __device__ qualifier to allow object creation on host as well as device.

    // Simple lightweight structures that encapsulates information about an image on device.