javaopenclgpgpuaparapi

OpenCL compile failed aparapi


Does anyone know what this means and how this could've happened? It looks right to me. I'm trying to use the GPU for a neural network.

This is the error:

!!!!!!! clCreateCommandQueue() failed out of host memory
May 28, 2018 6:36:39 PM com.aparapi.internal.kernel.KernelRunner fallBackToNextDevice
WARNING: Device failed for Util$1, devices={AMD<GPU>|AMD<CPU>|Java Alternative Algorithm|Java Thread Pool}: OpenCL compile failed
com.aparapi.internal.exception.AparapiException: OpenCL compile failed
    at com.aparapi.internal.kernel.KernelRunner.fallBackToNextDevice(KernelRunner.java:1286)
    at com.aparapi.internal.kernel.KernelRunner.executeInternalInner(KernelRunner.java:1550)
    at com.aparapi.internal.kernel.KernelRunner.executeInternalOuter(KernelRunner.java:1351)
    at com.aparapi.internal.kernel.KernelRunner.execute(KernelRunner.java:1342)
    at com.aparapi.Kernel.execute(Kernel.java:2856)
    at com.aparapi.Kernel.execute(Kernel.java:2813)
    at com.aparapi.Kernel.execute(Kernel.java:2753)
    at Util.Util.dotProduct(Util.java:46)
    at Network.FullyConnectedNetwork.predictOutput(FullyConnectedNetwork.java:181)
    at Network.FullyConnectedNetwork.test(FullyConnectedNetwork.java:321)
    at Run.RunFullyConnected.main(RunFullyConnected.java:32)

This is the code that caused the error:

public static double dotProduct(ArrayList<Double> in1, ArrayList<Double> in2) {

        final double[] in1Copy = new double[in1.size()];
        final double[] in2Copy = new double[in1.size()];
        for(int i = 0; i < in1.size(); i++) {
            in1Copy[i] = in1.get(i);
            in2Copy[i] = in2.get(i);
        }

        final double[] result = new double[1];

        Kernel kernel = new Kernel() {
            @Override
            public void run() {
                int i = getGlobalId();
                result[0] += in1Copy[i] + in2Copy[i];
            }
        };

        Range range = Range.create(in1Copy.length);
        kernel.execute(range);
        return result[0];

    }

Solution

  • Your issues all lie on this line:

    result[0] += in1Copy[i] + in2Copy[i];
    

    The biggest issue here is your trying to perform a read/write operation to the same memory location (read variable) from multiple threads at the same time. Even if this was allowed it would result in unexpected results. You are working in a GPGPU environment where you want to minimize the need for locking. This means you need to use a map-reduce type approach to solve these sorts of problems. For that you should create an actual result array of the same size as in1 and in2. Perform addition into that array per thread (the map step), then as a second step add all the elements of the array together (the reduce step).

    As a side note the exception your getting is unrelated to what I just mentioned. The issue is most likely that you are running on a system that simply doesnt have enough memory. For example the following runs just fine on my machine except for the issue i mentioned in the last paragraph (just tested it).

    import com.aparapi.*;
    import org.junit.Test;
    
    public class DotProductTest {
      @Test
      public void dotProduct() {
    
            final double[] in1Copy = new double[4096];
            final double[] in2Copy = new double[4096];
            for(int i = 0; i < 4096; i++) {
                in1Copy[i] = i;
                in2Copy[i] = i*10.0;
            }
    
            final double[] result = new double[1];
    
            Kernel kernel = new Kernel() {
                @Override
                public void run() {
                    int i = getGlobalId();
                    result[0] += in1Copy[i] + in2Copy[i];
                }
            };
    
            Range range = Range.create(in1Copy.length);
            kernel.execute(range);
            System.out.println(result[0]);
    
        }
    }