javadeeplearning4j

DL4J Load Array of Arrays as DataSet


If I want to load a DataSet from an array of doubles (e.g. {1, 2, 3, 4} in a text file with DL4j, I use this code:

int numLinesToSkip = 0;
                char delimiter = ',';
                String filePath = "data.txt";
                File file = new File(filePath);
                Path path = file.toPath();
                RecordReader recordReader = new CSVRecordReader(numLinesToSkip, delimiter);
            
                recordReader.initialize(new FileSplit(file));
                int variables = 2;
                int labelIndex = 50;
                int numClasses = variables;
                int batchSize = (int) lineCount;

                DataSetIterator iterator = new RecordReaderDataSetIterator(recordReader, batchSize, labelIndex, numClasses);
                
                DataSet allData = iterator.next();
                allData.shuffle();
//DataSet is now created

I would like to do the same but with a 3D array of arrays, such as: {{1, 1, 1,}{2, 2, 2}{3,4,5}{6, 6, 6}} etc.

I can format the file however I would like. I would simply like to be able to load an array per item rather than a single double - In other words, an array of array, rather than an array of double.


Solution

  • A DataSet can be created using 2 INDArrays as well. If you just want to skip using CSVs and just use your own arrays, you can do something like:

    INDArray yourInput = Nd4j.create(double[][]);
    INDArray yourLabels = Nd4j.create(double[][]);
    DataSet d = new DataSet(yourInput,yourLabels);
    

    Generally the abstraction you're using assumes each is a row. If you are going to be trying to do 2d, the only use case for that would be time series normally. Otherwise we enforce 1 row because it's convention to set a batch size at training/inference time. In order to work with that batch size, we can't really allow more than 1 row. There has to be standardization somewhere in order for us to be able to perform certain tasks for you like batching.