rxgboostpmml

Minimal differences between R and PMML xgboost probabilities output


I have built an xgboost model in R and exported it in PMML (with r2pmml).
I have tested the same dataset with R and PMML (with Java), the probabilities output are very close but they all have a small difference between 1e-8 and 1e-10. These differences are too small to be caused by a issue with the input data.

Is it a classic behaviour of rounding between different language/software or I did a mistake somewhere.


Solution

  • the probabilities output are very close but they all have a small difference between 1e-8 and 1e-10.

    The XGBoost library uses float32 data type (single-precision floating-point), which has a "natural precision" of around 1e-7 .. 1e-8 in this range (probability values, between 0 and 1).

    So, your observed difference is less than the "natural precision", and should not be a cause for further concern.

    The (J)PMML representation is carrying out exactly the same computations (summation of booster float values, applying a normalization function to it) as the native XGBoost representation.