Well I decided I prefer to use GPU over CPU especially since I'm working on a game and FPS will increase I expect. The thing is i'm not sure where to start. I can easily implement JOCL or JCUDA but after that I wouldnt know where to replace it from using CPU to GPU. Help is appreciated :)
What kind of computations are you after? If those are compute intensive such as N-body gravity experiments, then you can simply copy variables to gpu then compute then copy results back to main memory.
If your objects have big data but small computation such as fluid dynamics or collision detection, then you should add interoperability between your graphics api and compute api. Then you can do only computations withouth any copying of data.(speed-up is like your GPU ram bandwidth divided by your pci-e bandwidth. For a HD7870, it is like 25x if compute power is not saturated already)
I used jocl and lwjgl using gl/cl interoperability in java and they were working very well.
Some neural network is trained with CPU(Encog) but used by GPU(jocl) to generate a map and drawn by LWJGL :(neuron weigths are changed a little to have some more randomizing effect)
Very important part is:
Example:
// clh is a fictional class that binds oepncl to opengl through interoperability
// registering needed kernels to this object
clh.addKernel(
kernelFactory.fluidDiffuse(1024,1024), // enumaration is fluid1
kernelFactory.fluidAdvect(1024,1024), // enumeration is fluid2
kernelFactory.rigidBodySphereSphereInteracitons(2048,32,32),
kernelFactory.fluidRigidBodyInteractions(false), // fluidRigid
kernelFactory.rayTracingShadowForFluid(true),
kernelFactory.rayTracingBulletTargetting(true),
kernelFactory.gravity(G),
kernelFactory.gravitySphereSphere(), // enumeration is fall
kernelFactory.NNBotTargetting(3,10,10,2,numBots) // Encog
);
clh.addBuffers(
// enumeration is buf1 and is used as fluid1, fluid2 kernels' arguments
bufferFactory.fluidSurfaceVerticesPosition(1024,1024, fluid1, fluid2),
// enumeration is buf2, used by fluid1 and fluid2
bufferFactory.fluidSurfaceColors(1024,1024,fluid1, fluid2),
// enumeration is buf3, used by network
bufferFactory.NNBotTargetting(numBots*25, Encog)
)
Running kernels:
// shortcut of a sequence of kernels
int [] fluidCalculations = new int[]{fluid1,fluid2,fluidRigid, fluid1}
clh.run(fluidCalculations); // runs the registered kernels
// diffuses, advects, sphere-fluid interaction, diffuse again
//When any update of GPU-buffer from main-memory is needed:
clh.sendData(cpuBuffer, buf1); // updates fluid surface position from main-memory.
Changing a cpu code to a opencl code can be done automatically by APARAPI but Im not sure if it has interoperability.
If you need to do it yourself, then it is as easy as:
From Java:
for(int i=0;i<numParticles;i++)
{
for(int j=0;j<numParticles;j++)
{
particle.get(i).calculateAndAddForce(particle.get(j));
}
}
To a Jocl kernel string(actually very similar to calculateAndAddForce's inside):
"__kernel void nBodyGravity(__global float * positions,__global float *forces)" +
"{" +
" int indis=get_global_id(0);" +
" int totalN=" + n + "; "+
" float x0=positions[0+3*(indis)];"+
" float y0=positions[1+3*(indis)];"+
" float z0=positions[2+3*(indis)];"+
" float fx=0.0f;" +
" float fy=0.0f;" +
" float fz=0.0f;" +
" for(int i=0;i<totalN;i++)" +
" { "+
" float x1=positions[0+3*(i)];" +
" float y1=positions[1+3*(i)];" +
" float z1=positions[2+3*(i)];" +
" float dx = x0-x1;" +
" float dy = y0-y1;" +
" float dz = z0-z1;" +
" float r=sqrt(dx*dx+dy*dy+dz*dz+0.01f);" +
" float tr=0.1f/r;" +
" float tr2=tr*tr*tr;" +
" fx+=tr2*dx*0.0001f;" +
" fy+=tr2*dy*0.0001f;" +
" fz+=tr2*dz*0.0001f;" +
" } "+
" forces[0+3*(indis)]+=fx; " +
" forces[1+3*(indis)]+=fy; " +
" forces[2+3*(indis)]+=fz; " +
"}"