[SOLVED] Minimal differences between R and PMML xgboost probabilities output

Minimal differences between R and PMML xgboost probabilities output

I have built an xgboost model in R and exported it in PMML (with r2pmml).
I have tested the same dataset with R and PMML (with Java), the probabilities output are very close but they all have a small difference between 1e-8 and 1e-10. These differences are too small to be caused by a issue with the input data.

Is it a classic behaviour of rounding between different language/software or I did a mistake somewhere.

Solution

the probabilities output are very close but they all have a small difference between 1e-8 and 1e-10.

The XGBoost library uses float32 data type (single-precision floating-point), which has a "natural precision" of around 1e-7 .. 1e-8 in this range (probability values, between 0 and 1).

So, your observed difference is less than the "natural precision", and should not be a cause for further concern.

The (J)PMML representation is carrying out exactly the same computations (summation of booster float values, applying a normalization function to it) as the native XGBoost representation.