pythonflaskpysparkpmml

Save PySpark Pipeline to PMML and Deploy It Using Flask (ERROR in app upon request)


I have been trying to find a way to deploy a trained PySpark pipeline as an API, and I ended up landing on both Flask and PMML as possible solutions.

As far as I am aware, the generation of the PMML file is working: I train the pipeline using ParamGridBuilder, obtain the best model, and spit it out as a .pmml file.

A problem arises, though, when I load the resulting file into Flask. I am able to get the API running just fine; however, when I send it a request, I am not getting the expected result (the sentiment contained in the text), but the following error.

[2020-03-02 17:05:15,831] ERROR in app: Exception on /sentiment_analysis [GET]
Traceback (most recent call last):
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/users/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/users/sentiment_analysis.py", line 59, in hello_world
    resultado = evaluator.evaluate(df)
  File "/home/users/.local/lib/python3.6/site-packages/jpmml_evaluator/__init__.py", line 80, in evaluate
    javaArguments = self.backend.dict2map(arguments)
  File "/home/users/.local/lib/python3.6/site-packages/jpmml_evaluator/pyjnius.py", line 31, in dict2map
    raise ValueError()
ValueError
127.0.0.1 - - [02/Mar/2020 17:05:15] "GET /sentiment_analysis?text=test HTTP/1.1" 500 -

Here are the versions of the involved software and packages:

Also, below is the Python code I am using to load the model into Flask.

from flask import Flask, request
import pandas as pd
from jpmml_evaluator import make_evaluator, pyjnius

app = Flask('sentiment_analysis')

@app.route("/sentiment_analysis")
def hello_world():

    text = request.args.get('text')

    pyjnius.jnius_configure_classpath()

    backend = pyjnius.PyJNIusBackend()

    evaluator = make_evaluator(backend, "test.pmml") \
        .verify()

    df = pd.DataFrame(columns=["TWEET"], data=[[text]])

    result = evaluator.evaluate(df)

    sentiment = result.collect()[0]['prediction']

    if int(sentiment) == 0:
        sentiment = 'negative'
    else:
        sentiment = 'positive'

    return 'The sentiment is: ' + sentiment, 200

app.run(host='0.0.0.0', port=5001)

Does anyone know what's wrong here?


Solution

  • Your arguments DataFrame contains a complex column type; The Java backend that you have chosen (PyJNIus) does not know how to map this Python value to a Java value.

    Things you can try if you want to keep going down this roll-your-own Flask API way:

    All things considered, you would be much better off serving your PySpark models using the Openscoring REST web service. There is an up-to-date tutorial available about deploying Apache Spark ML pipeline models as a REST web service.