c++cudastackptxas

How to overcome Stack size warning?


I would like to know the best practice concerning the following type of warning:

ptxas warning : Stack size for entry function '_Z11cuda_kernelv' cannot be statically determined

It appears adding the virtual keyword to the destructor of Internal, i.e. moving from __device__ ~Internal(); to __device__ virtual ~Internal(); in the following programme:

template<typename T>
class Internal {
  T val;
public:
  __device__ Internal();
  __device__ virtual ~Internal();
  __device__ const T& get() const;
};

template<typename T>
__device__ Internal<T>::Internal(): val() {}
template<typename T>
__device__ Internal<T>::~Internal() {}
template<typename T>
__device__ const T& Internal<T>::get() const { return val; }


template<typename T>
class Wrapper {
  Internal<T> *arr;
public:
  __device__ Wrapper(size_t);
  __device__ virtual ~Wrapper();
};

template<typename T>
__device__ Wrapper<T>::Wrapper(size_t len): arr(nullptr) {
  printf("%s\n", __PRETTY_FUNCTION__);
  arr = new Internal<T>[len];
}

template<typename T>
__device__ Wrapper<T>::~Wrapper() {
  delete[] arr;
}

__global__ void cuda_kernel() {
  Wrapper<double> *wp = new Wrapper<double>(10);
  delete wp; 
}

int main() {
  cuda_kernel<<<1,1>>>();
  cudaDeviceSynchronize();
  return 0;
}

Having faced with the warning shown above, I wonder what I should do in this case?


Solution

  • The very short answer is that there is nothing you can do about this particular warning.

    In more detail:

    1. This warning is an assembler warning, not a compiler warning
    2. The NVIDIA toolchain relies on a lot of assembler level optimizations to produce performant SASS machine code that runs on the silicon. The NVIDIA compiler emits PTX virtual machine language which can undergo significant transformation when assembled. This includes resolution of single static assignment form compiler output into static register assignment (and register spilling to local memory), inline expansion of functions, and emission of a statically compiled stack reservation. All of these are potentially performance optimizing operations.
    3. This is an informational warning from the assembler, which is telling you that during a static code analysis, the assembler was unable to determine the stack size.
    4. The most normal scenario when the assembler emits this warning is when recursion is detected within kernel code. Your use case is clearly another.
    5. The warning comes from an assembler optimisation pass. The assembler is letting you know that there are potentially performance improvement opportunities being left on the table because the structure of the compiler output of your code can't allow it to statically determine the stack size
    6. The fallback the assembler will use will be more boilerplate SASS to set-up and tear-down the per thread stack which the kernel will require to run. The warning is letting you know that happened.