pythontypeerrorrapidscumluser-warning

cuML UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan


i am trying to train a RF regression using gridsearchcv. I change all file types to float32 and i still get these warnings that i'm not sure how to solve.

my code:

combined_df=cpd.concat([train_df,evaluate_df])
combined_df = combined_df.astype({
    'Mcap_w': 'float32',
    'constant': 'int32', 
    'TotalAssets': 'float32',
    'NItoCommon_w': 'float32',
    'NIbefEIPrefDiv_w': 'float32',
    'PrefDiv_w': 'float32',
},error='raise')

print(combined_df.iloc[:,2:].info(),combined_df['Mcap_w'])

test_fold = [0] * len(train_df) + [1] * len(evaluate_df)

#p_test3 = {'n_estimators':[50,100,200,300,500],'max_depth':[3,4,5,6,7,8], 'max_features':[5,10,15,21]}
p_test3 = {'n_estimators':[20,50,200,500],'max_depth':[3,5,7,10], 'max_features':[25]}

tuning = GridSearchCV(estimator =cuRFr(n_streams=1, min_samples_split=2, min_samples_leaf=1, random_state=0), 
            param_grid = p_test3, scoring='r2', cv=PredefinedSplit(test_fold=test_fold))
tuning.fit(combined_df.iloc[:,2:],combined_df['Mcap_w'])
print(tuning.best_score_)
tuning.cv_results_, tuning.best_params_, tuning.best_score_

the print output:

 #   Column                        Non-Null Count  Dtype
---  ------                        --------------  -----
 0   TotalAssets                   896 non-null    float32
 1   NItoCommon_w                  896 non-null    float32
 2   NIbefEIPrefDiv_w              896 non-null    float32

dtypes: float32(3)
memory usage: 101.5 KB
None

for print(combined_df['Mcap_w']) this would return a series as Name: Mcap_w, Length: 896, dtype: float32

then i get 32 Warnings followed by 32 TypeErrors because i am using GridSearchCV.

miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:988: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan.

TypeError: Implicit conversion to a host NumPy array via array is not allowed, To explicitly construct a GPU matrix, consider using .to_cupy() To explicitly construct a host matrix, consider using .to_numpy().

miniconda3/envs/rapid/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:988: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan.

miniconda3/envs/rapid/lib/python3.10/site-packages/cudf/core/frame.py", line 402, in array raise TypeError( TypeError: Implicit conversion to a host NumPy array via array is not allowed, To explicitly construct a GPU matrix, consider using .to_cupy() To explicitly construct a host matrix, consider using .to_numpy().


Solution

  • The errors mentioned already that we should

    consider using .to_cupy()
    To explicitly construct a host matrix, consider using .to_numpy().
    

    however that's a little bit confusing. The function expects numpy array and as type Float32.

    thus the fit method shuld look like this.

    tuning.fit(train_df.iloc[:,2:].to_numpy(dtype='float32'),
                                      train_df['Mcap_w'].to_numpy(dtype='float32'))