pythonmachine-learningapache-flinkflinkml

Embedd existing ML model in apache flink


we are training machine learning models offline and persist them in python pickle-files.

We were wondering about the best way to embedd those pickeled-models into a stream (e.g. sensorInputStream > PredictionJob > OutputStream.

Apache Flink ML seems to be the right choice to train a model with stream-data but not to reference an existing model.

Thanks for you response.

Kind Regards Lomungo


Solution

  • There are two possible solutions depending on the model You are using:

    1. Possibly the simples idea is to create external service that will call the model and return the results and then simply call the service with AsyncFunction
    2. Use some library, again depending on Your model to load the pre-trained model inside a ProcessFunction's open method. And then simply calling the model for each data that arrived.

    The second solution has two disadvantages, first You need to have the Java version of the specific library available and the other is that You need to somehow externalize the metadata of the model if You want to be able to update it over time.