javalstmrecurrent-neural-networkdeeplearning4jdl4j

DL4J LSTM - Contradictory Errors


I'm trying to create a simple LSTM using Deeplearning4J in Java, with 2 input features and a timeseries length of 1. However, I'm running into an error concerning the number of input dimensions when calling predict().

import org.deeplearning4j.nn.api.OptimizationAlgorithm;
import org.deeplearning4j.nn.conf.BackpropType;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.conf.layers.LSTM;
import org.deeplearning4j.nn.conf.layers.RnnOutputLayer;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.nd4j.linalg.activations.Activation;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.dataset.DataSet;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.learning.config.Adam;
import org.nd4j.linalg.lossfunctions.LossFunctions;

public class LSTMRegression {
    public static final int inputSize = 2,
                            lstmLayerSize = 4,
                            outputSize = 1;
    
    public static final double learningRate = 0.0001;

    public static void main(String[] args) {
        int miniBatchSize = 29;
        
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(new Adam(learningRate))
                .list()
                .layer(0, new LSTM.Builder().nIn(inputSize).nOut(lstmLayerSize)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.IDENTITY).build())
                .layer(1, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.SIGMOID).build())
                .layer(2, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.SIGMOID).build())
                .layer(3, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MSE)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.IDENTITY)
                        .nIn(lstmLayerSize).nOut(outputSize).build())
                
                .backpropType(BackpropType.TruncatedBPTT)
                .tBPTTForwardLength(miniBatchSize)
                .tBPTTBackwardLength(miniBatchSize)
                .build();
        
        var network = new MultiLayerNetwork(conf);
        
        network.init();
        network.fit(getTrain());
        
        System.out.println(network.predict(getTest()));
    }
    
    public static DataSet getTest() {
        INDArray input = Nd4j.zeros(29, 2, 1);

        INDArray labels = Nd4j.zeros(29, 1);
        
        return new DataSet(input, labels);
    }
    
    public static DataSet getTrain() {
        INDArray input = Nd4j.zeros(29, 2, 1);
        INDArray labels = Nd4j.zeros(29, 1);
        
        return new DataSet(input, labels);
    }
}

The following error occurs when run:

22:38:28.803 [main] INFO  o.d.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
22:38:29.755 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [29, 2, 1] and labels with shape [29, 1]
Exception in thread "main" java.lang.IllegalStateException: predict(INDArray) method can only be used on rank 2 output - got array with rank 3
    at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:639)
    at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:274)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.predict(MultiLayerNetwork.java:2275)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.predict(MultiLayerNetwork.java:2286)
    at LSTMRegression.main(LSTMRegression.java:78)

That's weird I figured, but I tried reshaping it anyway:

    public static DataSet getTest() {
        INDArray input = Nd4j.zeros(29, 2, 1).reshape(29, 2);

        INDArray labels = Nd4j.zeros(29, 1);
        
        return new DataSet(input, labels);
    }

...Leading to the opposite problem:

22:45:28.232 [main] WARN  o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [29, 2, 1] and labels with shape [29, 1]
Exception in thread "main" java.lang.IllegalStateException: 3D input expected to RNN layer expected, got 2
    at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:639)
    at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:265)
    at org.deeplearning4j.nn.layers.recurrent.LSTM.activateHelper(LSTM.java:121)
    at org.deeplearning4j.nn.layers.recurrent.LSTM.activate(LSTM.java:110)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.outputOfLayerDetached(MultiLayerNetwork.java:1349)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.output(MultiLayerNetwork.java:2467)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.output(MultiLayerNetwork.java:2430)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.output(MultiLayerNetwork.java:2421)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.output(MultiLayerNetwork.java:2408)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.predict(MultiLayerNetwork.java:2270)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.predict(MultiLayerNetwork.java:2286)
    at LSTMRegression.main(LSTMRegression.java:78)

What exactly am I doing wrong here?

EDIT: I used zeros obviously to make the code a bit easier to read. Here's what my training and test data actually looks like in the form of a multidimensional double array:

public static DataSet getData() {
        double[][][] inputArray = {
            {{18.7}, {181}},
            {{17.4}, {186}},
            {{18}, {195}},
            {{19.3}, {193}},
            {{20.6}, {190}},
            {{17.8}, {181}},
            {{19.6}, {195}},
            {{18.1}, {193}},
            {{20.2}, {190}},
            {{17.1}, {186}},
            {{17.3}, {180}},
            ...
        }
       double[][] outputArray = {
                {3750},
                {3800},
                {3250},
                {3450},
                {3650},
                {3625},
                {4675},
                {3475},
                {4250},
                {3300},
                {3700},
                {3200},
                {3800},
                {4400},
                {3700},
                {3450},
                {4500},
                ...
        };
        INDArray input = Nd4j.create(inputArray);
        INDArray labels = Nd4j.create(outputArray);
        
        return new DataSet(input, labels);
}

... As well as my test data (updated to only include inputs):

public static INDArray getTest() {
        double[][][] test = new double[][][]{
            {{20}, {203}},
            {{16}, {183}},
            {{20}, {190}},
            {{18.6}, {193}},
            {{18.9}, {184}},
            {{17.2}, {199}},
            {{20}, {190}},
            {{17}, {181}},
            {{19}, {197}},
            {{16.5}, {198}},
            {{20.3}, {191}},
            {{17.7}, {193}},
            {{19.5}, {197}},
            {{20.7}, {191}},
            {{18.3}, {196}},
            {{17}, {188}},
            {{20.5}, {199}},
            {{17}, {189}},
            {{18.6}, {189}},
            {{17.2}, {187}},
            {{19.8}, {198}},
            {{17}, {176}},
            {{18.5}, {202}},
            {{15.9}, {186}},
            {{19}, {199}},
            {{17.6}, {191}},
            {{18.3}, {195}},
            {{17.1}, {191}},
            {{18}, {210}}
        };
        
        INDArray input = Nd4j.create(test);
        
        return input;
    }

Solution

  • There are several problems you've got here. If you read the documentation for predict it tells you:

    Usable only for classification networks in conjunction with OutputLayer. Cannot be used with RnnOutputLayer, CnnLossLayer, or networks used for regression.

    The error message therefore tells you that it only works with rank 2 output.

    In your attempted solution, you try then to reshape the input, and the network complains that it isn't getting the input it is expecting.

    You want to either use rnnTimeStep (for single stepping) or output (for the entire sequence) to get the unprocessed output, and then apply the argMax accordingly.

    The output of rnnTimeStep() is just a slice of output, so in order to get the same output as predict, you should be able to use output.argMax(1).toIntVector() on it.

    The output of output() will be a 2-D matrix, so you'll need to specify the correct axes.