openclmemory-barriersbarriermemory-fences

OpenCL 1.2: mem_fence() or barrier() or both


I have just started openCL C programming. All work items of a work group update unique locations of local memory. Later, a private variable of a work item is updated based on local data updated by two other work items. Something like this:

__kernel MyKernel(__global int *in_ptr)
         { 
           /* Define a variable in private address space */
           int priv_data;
           /* Define two indices in private address space */
           int index1, index2;

           /* index1 and index2 are legitimate local work group indices */  
           index1 = SOME_CORRECT_VALUE;
           index2 = ANOTHER_CORRECT_VALUE;

           /* Define storage in local memory large enough to cater to all work items of this work group */
           __local int tempPtr[WORK_GROUP_SIZE];
           tempPtr[get_local_id(0)] = SOME_RANDOM_VALUE;

           /* Do not proceed until the update of tempPtr by this WI has completed */
           mem_fence(CLK_LOCAL_MEM_FENCE);

           /* Do not proceed until all WI of this WG have updated tempPtr */
           barrier(CLK_LOCAL_MEM_FENCE);

           /* Update private data */
           priv_data = tempPtr[index1] + tempPtr[index2];
       }

Although the snippet above is conservative, wouldn't barrier have done the job as it internally does fencing?


Solution

  • Yes, barrier already does fencing.

    A barrier will sync the execution in that point. So, all previous instructions have to be executed, therefore memory is consistent at that point. A fence will only ensure all reads/writes are finished before any further read/write is performed, but the workers may be executing different instructions.

    In some cases you can go with a single fencing. If you do not care about local workers going out of sync, and you just want the previous memory writes/read be completed. In your case a fence would be enough. (unless that code is running in a loop and there is extra code you have not put in the example).