pythononnxonnxruntime

ONNX performance compared to sklearn


I have converted a sklearn logistic regression model object to an ONNX model object and noticed that ONNX scoring takes significantly longer to score compared to the sklearn.predict() method. I feel like I must be doing something wrong b/c ONNX is billed as an optimized prediction solution. I notice that the difference is more noticeable with larger data sets so I created X_large_dataset as as proxy.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import datetime
from sklearn.linear_model import LogisticRegression
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import numpy as np
import onnxruntime as rt

# create training data
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

# fit model to logistic regression
clr = LogisticRegression()
clr.fit(X_train, y_train)

# convert to onnx format
initial_type = [('float_input', FloatTensorType([None, 4]))]
onx = convert_sklearn(clr, initial_types=initial_type)
with open("logreg_iris.onnx", "wb") as f:
    f.write(onx.SerializeToString())
    
# create inference session from onnx object
sess = rt.InferenceSession(
    "logreg_iris.onnx", providers=rt.get_available_providers())
input_name = sess.get_inputs()[0].name

# create a larger dataset as a proxy for large batch processing
X_large_dataset = np.array([[1, 2, 3, 4]]*10_000_000)
start = datetime.datetime.now()
pred_onx = sess.run(None, {input_name: X_large_dataset.astype(np.float32)})[0]
end = datetime.datetime.now()
print("onnx scoring time:", end - start)

# compare to scoring directly with model object
start = datetime.datetime.now()
pred_sk = clr.predict(X_large_dataset)
end = datetime.datetime.now()
print("sklearn scoring time:", end - start)

This code snippet on my machine shows that sklearn predict runs in less than a second and ONNX runs in 18 seconds.


Solution

  • Simply converting a model to ONNX does not mean that it will automatically have a better performance. During conversion, ONNX tries to optimize the computational graph for example by removing calculations which do not contribute to the output, or by fusing separate layers into a single operator. For a generic neural network consisting of convolution, normalization and nonlinearity layers, these optimizations often result in a higher throughput and better performance.

    So considering you are exporting just LogisticRegression, most likely both sklearn and the corresponding onnx implementations are already very optimized and the conversion will not lead to any performance gain.

    As to why the InferenceSession.run is 20x slower than sklearn.predict

    1. X_large_dataset is a np.int64 array over 300 MB in size. Casting it with astype when creating the input dictionary inside of run creates a new 150 MB array to which everything is copied. This obviously shouldn't be counted towards the model execution time.
    2. onnxruntime has quite a bit of memory management overhead when executing models with dynamic inputs for the first time. Subsequent calls to run with inputs of the same shape should finish a lot faster.