Is it possible to send data to GPU memory which is defined as Union, from JCuda?

I have defined a new data type like this in GPU side (cuda):

typedef union {
    int i;
    double d;
    long l;
    char s[16];
} data_unit;

data_unit *d_array;

And in Java, we have an array of one of the kinds available in the defined union. Normally, we can do the following in Java (JCuda) if we have an array of int type for example:

import static jcuda.driver.JCudaDriver.*;


int data_size;
CUdeviceptr d_array;
int[] h_array = new int[data_size];

cuMemAlloc(d_array, data_size * Sizeof.INT);
cuMemcpyHtoD(d_array, Pointer.to(h_array), data_size * Sizeof.INT);

But how can it be done if there is an array on device that its type is our union? (assume that still the h_array is of the type int)

int data_size;
CUdeviceptr d_array;
int[] h_array = new int[data_size];

cuMemAlloc(d_array, data_size * Sizeof.?);
// Here we should have some type of alignment (?)
cuMemcpyHtoD(d_array, Pointer.to(h_array), data_size * Sizeof.?);

Solution

I believe there is a fundamental misunderstanding of what a union is.

Lets think about it. What makes a union different from a struct? It can store different types of data at different times.

How does it accomplish this feat? Well one could use some sort of separate variable to dynamically specify the type or how much memory it takes up, but a Union does not do this, it relies on the programmer knowing exactly what type they want to retrieve and when. So the only alternative, if the type is only actually known by the programmer at any given point in time, is to merely make sure there is enough space allocated for your union variable that one could always use it for what ever type.

Indeed, this is what a union does, see here (yes I know it is C/C++, but this also applies to CUDA as well). What does that mean for you? It means that the size of your union array should be the size of its largest member x the number of elements, since the size of a union is the size of its largest member.

Lets look at your union to see how to figure it out.

typedef union {
    int i;
    double d;
    long l;
    char s[16];
} data_unit;

Your union has:

int i, which we assume to be 4 bytes
double d, which is 8 bytes
long l, which is confusing because depending on the compiler/platform can either be 4 or 8 bytes, we assume 8 bytes for now.
char s[16], easy, 16 bytes

So the largest number of bytes any member takes up is your char s[16] variable, 16 bytes. This means that you will need to change your code to:

int data_size;
int union_size = 16;
CUdeviceptr d_array;
// copying this to the device will not result in what you expect with out over allocating
// if you just copy over integers, which occupy 4 bytes each, your integers will fill less space than the number of unions 
//  we need to make sure that there is a "stride" here if we want to actually copy real data from host to device. 
// union_size / Sizeof.INT = 4, so there will be 4 x as many ints, 4 for each union. 
int[] h_array = new int[data_size * (union_size / Sizeof.INT)];


// here we aren't looking for size of int to allocate, but the size of our union. 
cuMemAlloc(d_array, data_size * union_size);
// we are copying, again, data_size * union_size bytes
cuMemcpyHtoD(d_array, Pointer.to(h_array), data_size * union_size);

NOTE

If you want to copy ints over, this basically means you will need to assign every 4th int to the actual int you want for that index.

int 0 is h_array[0], int 1 is h_array[4] int 2 is h_array[8] int n is h_array[n * 4] etc..