javadl4j

The XOR neural network written in DL4J does not work


I'm starting to study the neural network together with the DL4j framework and start with XOR training. But no matter what I do, I get the wrong results.

MultiLayerConfiguration networkConfiguration = new NeuralNetConfiguration.Builder()
            .weightInit(WeightInit.SIGMOID_UNIFORM)
            .list()
            .layer(new DenseLayer.Builder()
                    .nIn(2).nOut(2)
                    .activation(Activation.SIGMOID)
                    .build())
            .layer( new DenseLayer.Builder()
                    .nIn(2).nOut(2)
                    .activation(Activation.SIGMOID)
                    .build())
            .layer( new OutputLayer.Builder()
                    .nIn(2).nOut(1)
                    .activation(Activation.SIGMOID)
                    .lossFunction(LossFunctions.LossFunction.XENT)
                    .build())
            .build();

    MultiLayerNetwork network = new MultiLayerNetwork(networkConfiguration);
    network.setListeners(new ScoreIterationListener(1));
    network.init();


    INDArray input = Nd4j.createFromArray(new double[][]{{0,1},{0,0},{1,0},{1,1}});

    INDArray output = Nd4j.createFromArray(new double[][]{{0^1},{0^0},{1^0},{1^1}});
    //   INDArray output = Nd4j.createFromArray(new double[]{0^1,0^0,1^1,1^0});
    //DataSet dataSet = new org.nd4j.linalg.dataset.DataSet(input,output);

    for (int i = 0; i < 10000; i++) {
        network.fit(input,output);
    }


    INDArray res = network.output(input,false);

    System.out.print(res);

learning result:

[[0.5748], 
 [0.5568], 
 [0.4497], 
 [0.4533]]

Solution

  • That looks like an old example. Where did you get it from? Note that the project does not endorse or support random examples people pull from. If this is from the book, please note those examples are a few years old at this point and should not be used.

    This should be the latest one: https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/quickstart/modeling/feedforward/classification/ModelXOR.java

    This configuration suffers from what I like to call the "toy problem syndrome". Dl4j assumes minibatches by default and therefore clips the learning by default relative to the minibatch size of the input examples. This is how 99% of problems are setup if you do anything in the real world.

    This means that each step the net takes is actually not the full step it would take if you do a toy problem with the whole set in memory. Our latest example handles this by turning minibatch off for this:

          MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .updater(new Sgd(0.1))
                .seed(seed)
                .biasInit(0) // init the bias with 0 - empirical value, too
                // The networks can process the input more quickly and more accurately by ingesting
                // minibatches 5-10 elements at a time in parallel.
                // This example runs better without, because the dataset is smaller than the mini batch size
                .miniBatch(false)
                .list()
                .layer(new DenseLayer.Builder()
                    .nIn(2)
                    .nOut(4)
                    .activation(Activation.SIGMOID)
                    // random initialize weights with values between 0 and 1
                    .weightInit(new UniformDistribution(0, 1))
                    .build())
                .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                    .nOut(2)
                    .activation(Activation.SOFTMAX)
                    .weightInit(new UniformDistribution(0, 1))
                    .build())
                .build();
    

    Note the minibatch(false) in the configuration.