CUDA unified memory implementation for vector sum

I tried to implement vector addition using Unified Memory architecture. Here is my code

#include<stdio.h>
#include<cuda.h>

#define n 10
__global__ void vec_add(float *c, float *a, float *b, int n){

        int i;
        //Get global thread ID
        i = blockDim.x*blockIdx.x+threadIdx.x;
        if(i<n){
                c[i] = a[i] + b[i];
        }
}

int main(int argc, char* argv[]){

        int thread_count;
        float *a, *b, *c;
        
        thread_count = strtol(argv[1], NULL, 10);
        cudaMallocManaged(&c, n*sizeof(float));
        cudaMallocManaged(&a, n*sizeof(float));
        cudaMallocManaged(&b, n*sizeof(float));

for(int i=0; i<n; i++){
                a[i]=1.0;
                b[i]=2.0;
}


//Launch Kernel
        vec_add<<<1,thread_count>>>(c, a, b, n);

        //Synchronize threads
        cudaDeviceSynchronize();

for(int i=0; i<n; i++){
                printf("%f + %f =%f\n", a[i], b[i], c[i]);
}
                        cudaFree(c);
                        cudaFree(a);
                        cudaFree(b);


        return 0;
}

I got error while run the codeexpected a ")". I did not found the parenthesis problem. How could I recover from the error? Also I need a brief structure description about how to write cuda program using unified memory.

Solution

here is the brief description.

The problem you have is here:

#define n 10
__global__ void vec_add(float *c, float *a, float *b, int n){

You may not know how a C++ preprocessor macro (#define) works. It creates a substitution that will be performed by the preprocessor. So what you are telling the preprocessor to do is to change your kernel definition line like this

__global__ void vec_add(float *c, float *a, float *b, int 10){
                                                        ^^^^^

And of course that is not valid C++ syntax for a function definition. One possible way to fix this would be to change your variable name in the (kernel) function definition to be something other than n, perhaps like this:

#define n 10
__global__ void vec_add(float *c, float *a, float *b, int nk){

        int i;
        //Get global thread ID
        i = blockDim.x*blockIdx.x+threadIdx.x;
        if(i<nk){
                c[i] = a[i] + b[i];
        }
}

Even though this happens to be a CUDA kernel definition, the problem here would be exactly the same if you wrote an ordinary function definition, and used n as one of the function parameters. This is related to C++ understanding, not anything specific or unique to CUDA.