pythonmachine-learningrandom-forestmachine-learning-modelrapids

Rapids CUML Random Forest Regression Model Inference


I am using the Random Forest Regression model from CUML 0.10.0 library on Google Colab and having trouble with obtaining model predictions. After the model training ends successfully, I am using the (.predict) method for inference on a very large array of size (41697600, 11). However, I am receiving the following error:

TypeError: GPU predict model only accepts float32 dtype as input, convert the data to float32 or use the CPU predict with `predict_model='CPU'`.

The error persists even after casting the input numpy array's dtype to float32 and specifying the predict_model='CPU' argument in the predict method.

This is the used code for your reference:

array=(X_test.values).astype('float32')
predictions = cuml_model.predict(array, predict_model='CPU',output_class=False, algo='BATCH_TREE_REORG')

Model summary:

<bound method RandomForestRegressor.print_summary of RandomForestRegressor(n_estimators=10, max_depth=16, handle=<cuml.common.handle.Handle object at 0x7fbfa342e888>, max_features='auto', n_bins=8, n_streams=8, split_algo=1, split_criterion=2, bootstrap=True, bootstrap_features=False, verbose=False, min_rows_per_node=2, rows_sample=1.0, max_leaves=-1, accuracy_metric='mse', quantile_per_tree=False, seed=-1)>

Solution

  • This error message is extremely confusing. I believe it's failing because the training was in float64 not the prediction. So if you train in float32 instead, this should all work. The optimized GPU implementation of prediction only supports float32 models at this time. You should be able to fall back to the slow CPU prediction, but this exception is blocking it.

    I filed this as a bug and we'll try to get a fix in for the upcoming release. Feel free to follow along there or add any extra questions etc.: https://github.com/rapidsai/cuml/issues/1406