I have a C program to find whether 2 sets of polygons are overlapped. User input 2 sets of polygon (each set of data has several thousands polygons) and the program see which polygon in set1 overlap with which polygon in set2
I have 2 struct like these:
struct gpc_vertex /* Polygon vertex */
{
double x;
double y;
};
struct gpc_vertex_list /* Polygon contour */
{
int pid; // polygon id
int num_vertices;
double *mbr; // minimum bounding rectangle of the polygon, so always 4 elements
};
I have the following segment of code:
#pragma acc kernels copy(listOfPolygons1[0:polygonCount1], listOfPolygons2[0:polygonCount2], listOfBoolean[0:dump])
for (i=0; i<polygonCount1; i++){
polygon1 = listOfPolygons1[i];
for (j=0; j<polygonCount2; j++){
polygon2 = listOfPolygons2[j];
idx = polygonCount2 * i + j;
listOfBoolean[idx] = isRectOverlap(polygon1.mbr, polygon2.mbr); // line 115
}
}
listOfPolygons1 and listOfPolygons2 are (as the name implied) an array of gpc_vertex_list.
listOfBoolean is an array of int.
the mbr of the 2 polygons are checked to see if they are overlapped, and the function "isRectOverlap" return 1 if they are, 0 if they are not and put the value to listOfBoolean
Problem
The code can compile but not able to run. It returns the following error:
call to cuEventSynchronize returned error 700: Illegal address during kernel execution
My observation
The program can compile and run by changing line 115 to this:
isRectOverlap(polygon1.mbr, polygon2.mbr); // without assigning value to listOfBoolean
or this:
listOfBoolean[idx] = 5; // assigning an arbitrary value
(though the result is wrong, but at least, it can run)
Question
Both "isRectOverlap" and "listOfBoolean" do not seem to produce the problem if value is not passed from "isRectOverlap" to "listOfBoolean"
Does anyone know why it can't run if I assign the return value from "isRectOverlap" to "listOfBoolean"?
isRectOverlap function is like this:
int isRectOverlap(double *shape1, double *shape2){
if (shape1[0] > shape2[2] || shape2[0] > shape1[2]){
return 0;
}
if (shape1[1] < shape2[3] || shape2[1] < shape1[3]){
return 0;
}
return 1;
}
The program has no problem when not running in OpenACC
Thanks for helping
When aggregate data types are used in an OpenACC data clause, a shallow copy of the type is performed. What's most likely happening here is that when the listOfPolygons arrays are copied to the device, "mbr" will contain host addresses. Hence, the program will give an illegal address error when a "mbr" is accessed.
Given the comment says that "mbr" will always be 4, the simplest thing to do is make "mbr" a fixed size array of size 4.
Assuming you're using PGI compilers with an NVIDIA device, a second method is to use CUDA Unified Memory by compiling "-ta=tesla:managed". All dynamic memory would be handled by the CUDA runtime and allow host addresses to be accessed on the device. The caveats being that it's only available for dynamic data, your whole program can only use as much memory as available on the device, and it may slow down your program. http://www.pgroup.com/lit/articles/insider/v6n2a4.htm
A third option is to perform a deep copy of the aggregate type to the device. I can post an example if you decide to go this route. I also talk about the subject as part of a presentation I did at GTC2015: https://www.youtube.com/watch?v=rWLmZt_u5u4