we are training machine learning models offline and persist them in python pickle-files.
We were wondering about the best way to embedd those pickeled-models into a stream (e.g. sensorInputStream > PredictionJob > OutputStream.
Apache Flink ML seems to be the right choice to train a model with stream-data but not to reference an existing model.
Thanks for you response.
Kind Regards Lomungo
There are two possible solutions depending on the model You are using:
AsyncFunction
ProcessFunction
's open
method. And then simply calling the model for each data that arrived. The second solution has two disadvantages, first You need to have the Java version of the specific library available and the other is that You need to somehow externalize the metadata of the model if You want to be able to update it over time.