scalalinear-regressionapache-flinkflinkml

Flink Multiple Linear Regression: does it have Predict?


I've a multiple regression model trained and now I want to use it to predict.

Reading the documents I understand that the input is a labeled vector and the output is a Dataset with tuple [InputValue, PredictValue], right?

I create my labeled Vector:

val mapped = data.map {x => new org.apache.flink.ml.common.LabeledVector (x._4, org.apache.flink.ml.math.DenseVector(x._1,x._2,x._3)) }

//Print
mapped: org.apache.flink.api.scala.DataSet[org.apache.flink.ml.common.LabeledVector] = org.apache.flink.api.scala.DataSet@7d4fefdc
LabeledVector(6.7, DenseVector(33.0, -52.26, 28.3))
LabeledVector(5.8, DenseVector(36.0, 45.53, 150.93))
.....

And with my model created and trained I predict:

// Calculate the predictions for the test data
val predictions = mlr.predict(mapped)

I got this ERROR:

java.lang.RuntimeException: There is no PredictOperation defined for org.apache.flink.ml.regression.MultipleLinearRegression which takes a DataSet[org.apache.flink.ml.common.LabeledVector] as input.

But you can see here that the official documentation say that it exits.

Thanks for your help! :)


Solution

  • The prediction of LabeledVectors has been removed with this commit. Unfortunately, the Flink documentation has not been updated. I've created an issue to update the documentation.

    If you want to predict LabeledVectors, then you have to write your own PredictOperation which supports the respective types.