I have a template function where the template parameter is an integer. This integer is used to create different kernels. Previously all the possible templates where instantiated by hand in a table (works but ugly) but I tried to use the solution proposed here. Because I have more that 800 possible kernels the template recursion method is much more elegant. I have tested the template recursion on the C++ version of my code and it works perfectly, but nvcc seems to limit the recursion of my instantiation.
Here a simplified example of my previous ugly template instantiation list that works properly (even with the 800 kernel instantiations):
// the template kernel
template <int i> __global__ void kernel(int some_data)
{
switch(i)
{
case 0:
// do something
break;
case 1:
// do some other things
break;
//...
case 799:
// do some other things
break;
}
}
typedef void (*kernel_pointer) (int some_data)
// the ugly huge list
kernel_pointer kernel_list[800] = {
&kernel <0>,
&kernel <1>,
//...
&kernel <799> }
int main()
{
int kernel_index = 10;
//the call
kernel_pointer my_kernel = kernel_list[kernel_index];
my_kernel<<<<1,1>>>>(the_data);
}
Here the beautiful template recursion that nvcc doesn't like. It replace the list in the previous piece of code:
#define N_KERNELS 800
template< int i> bool dispatch_init( kernel_pointer* pTable )
{
pTable[i] = &kernel<i>;
return dispatch_init<i-1>( pTable );
}
// edge case of recursion
template<> bool dispatch_init<-1>(kernel_pointer* pTable) { return true; }
// call the recursive function
const bool initialized = dispatch_init<-1>( kernel_list );
In reality I don't have a single template parameter but 6 that combines to create all the hundreds combinations. Else, a switch with 800 cases would be really stupid. Does anyone have an idea to increase the nvcc template recursion limit or another automatic way to create my list?
Edit: I found the gcc option ftemplate-depth that changes the instantiation recursion limit but I haven't found an equivalent nvcc option.
From the idea proposed by Robert Crovella, which consists in building the table in few pieces, here is an example of how to fix the "error" :
#define N_KERNELS 850
// template kernel
template <int i> __global__ void kernel(int a)
{
switch(i)
{
case 0:
printf("%d\n", a*i);
break;
case 1:
printf("%d\n", a*i);
break;
//...
case 849:
printf("%d\n", a*i);
break;
}
}
typedef void (*kernel_pointer) (int);
kernel_pointer kernel_list[N_KERNELS];
// Function that instantiates all the needed kernels using recursion.
template< int i> bool dispatch_init( kernel_pointer* pTable )
{
pTable[i] = &kernel<i>;
return dispatch_init<i+1>( pTable );
}
// Edge cases of recursion made with a template specialization
template<> bool dispatch_init<199>(kernel_pointer* pTable)
{
pTable[199] = &kernel<199>;
return true;
}
template<> bool dispatch_init<399>(kernel_pointer* pTable)
{
pTable[399] = &kernel<399>;
return true;
}
template<> bool dispatch_init<599>(kernel_pointer* pTable)
{
pTable[599] = &kernel<599>;
return true;
}
template<> bool dispatch_init<799>(kernel_pointer* pTable)
{
pTable[799] = &kernel<799>;
return true;
}
template<> bool dispatch_init<N_KERNELS>(kernel_pointer* pTable) { return true; }
// Call the recursive function few times to instantiate all the kernels without reaching the recursive instantiation limit
const bool initialized = dispatch_init<0 >( kernel_list );
const bool initialized = dispatch_init<200>( kernel_list );
const bool initialized = dispatch_init<400>( kernel_list );
const bool initialized = dispatch_init<600>( kernel_list );
const bool initialized = dispatch_init<800>( kernel_list );
int main()
{
int kernel_index = 10;
kernel_pointer my_kernel = kernel_list[kernel_index];
my_kernel<<<<1,1>>>>(6);
}
I don't like this fix but it will do the trick for the moment. If a Nvidia developer comes here some day, it should be a good idea to add the option "ftemplate-depth" to nvcc, no?