[SOLVED] What redistributable is required from end-users to run OpenCL applications?

What redistributable is required from end-users to run OpenCL applications?

I have developed a C++ application on a NVidia-based OpenCL architecture and wish to distribute it to end users.

Unfortunately, it appears that users with an ATI card cannot run my game as the dll that contains my OpenCL code won't even load (dynamically), while user with NVidia drivers appear to be able to load my dll.

What is the recommended 'best practice' when shipping an OpenCL-based app that 'runs anywhere'? Is it possible for the app provider to insert all the dlls that will enable all users to use the app, or are users from different OpenCL architecture forced to download that architecture's OpenCL SDK?

Many thanks!

EDIT: Curiously the missing dll dependency was resolved by adding NVCuda.dll to my build. (Would like to remove that!) However the answers provided here are quite useful for 'best practice' in regards to building an OpenCL app that can run on most platforms...

Solution

They need GPU drivers. For Intel CPU, they may manually download the necessary binaries.

AMD device compiler's compiling action takes some time while Nvidia's can compile quickly. Compiling time is very low when you target CPU. I converted a basic C++ fluid&raytracer simulation into opencl version and it compiled after 3 minutes!(I mean device opencl-c compiling of kernels) If you want to give people an already-compiled project, then you would need to have every single type of card on your access and compile&save binaries for all of them.

Some gl-cl-dx sharing operations can be incompatible between vendors.

Dont use platform-specific constants, they may not be mapped fully on other platforms.

Tell people your targeted opencl version.

Dont use larger than 256 local work group size for GPU computing. AMD GPUs' maximum local work group size is 256 while Nvidia's is 1024.

Dont spill private registers, decrease depth of pseudo-recursive functions if you need it badly. Sometimes AMD compiler tries to optimize so much that it explodes at native device compile time.

Use a "platform & device query wrapper" of your own that finds a proper gpu, dont just get platform[0] or device[0]. Users may have multiple platforms such as Intel's for CPU and AMD's for GPU, maybe all of them. APUs' included GPUs may be known as ACC instead of GPU(Im not sure about this)

Your implicit synchronization of kernels&buffer_transfers can successfully run on your system while not on other systems.

Check if your dlls or app is same bitness with other peoples' machine&OS. If you target 64 bit and they have 32bit OS then it will not work.