So I have an OpenCL program that prints out the following information:
Version ....... OpenCL 1.2 (Mar 15 2018 21:59:37)
Vendor ........ Apple
Profile ....... FULL_PROFILE
Name .......... Apple
GPUS:
Device #0
Max work group size ......... 256
Max work item dimensions .... 3
Max work item sizes ......... 256 256 256
Name ........................ Intel(R) HD Graphics 630
Platform ....................
Profile ..................... FULL_PROFILE
Vendor ...................... Intel Inc.
Version ..................... OpenCL 1.2
Driver version .............. 1.2(Mar 15 2018 22:04:21)
Device #1
Max work group size ......... 256
Max work item dimensions .... 3
Max work item sizes ......... 256 256 256
Name ........................ AMD Radeon Pro 560 Compute Engine
Platform ....................
Profile ..................... FULL_PROFILE
Vendor ...................... AMD
Version ..................... OpenCL 1.2
Driver version .............. 1.2 (Mar 15 2018 21:59:57)
CPUS:
Device #0
Max work group size ......... 1024
Max work item dimensions .... 3
Max work item sizes ......... 1024 1 1
Name ........................ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Platform ....................
Profile ..................... FULL_PROFILE
Vendor ...................... Intel
Version ..................... OpenCL 1.2
Driver version .............. 1.1
So the question: On the AMD, it has 3 dimensions with 256 in each dimension. Does this mean that it can do 256^3 parallel computations simultaneously? Or is there another meaning to that information? And in the same vein, can the Intel HD Graphics perform the same exact computations? Why are they seperate cards then?
It does not specify the amount of work your GPU can do in parallel. "Max work item sizes" specifies the maximum sizes of a work group in each dimension. Your are also bounded by "Max work group size" which your width * height * depth
can not exceed. Furthermore, each kernel has a maximum work group size which can be queried by clGetKernelWorkGroupInfo
with CL_KERNEL_WORK_GROUP_SIZE
. In my experience, you usually don't want to approach these limits; your kernel will run faster with smaller workgroups. Unless you have some reason you need big work groups, just make them 32 or 64 items (or if you don't use shared local memory, leave the work group size default (NULL) and let the runtime choose one (but keep your global work group size something that can be divided nicely or you'll end up with suboptimal work group sizes).
Your Intel and AMD GPUs are reported separately because they are separate devices. As to why Apple put two GPUs in one box, that's up to them. Usually so the user can make a speed versus power usage choice.