javand4j

Create INDArray (ND4J) from csv or spark data frame


I want to use ND4J purely as linear algebra library and would like to create an INDArray directly from either a csv file or a spark dataframe without having to create an intermediate Java collection on-heap. I tried to find a solution everywhere but couldn’t find anything.


Solution

  • Nd4j maintainer here. If you want to read just use a map over a spark data frame, have each step create a row and then combine the final result using Nd4j.concat(..).

    If you can, consider converting to arrow first and then using the nd4j-arrow converter.It uses the tensor abstraction which didn't seem to get as much support as I hoped. You can find that here: https://github.com/deeplearning4j/deeplearning4j/blob/master/nd4j/nd4j-serde/nd4j-arrow/src/main/java/org/nd4j/arrow/ArrowSerde.java#L42

    Feel free to file an issue over on the dl4j repository and I can take a look at other recommendations for you as well.