pythonmachine-learningscikit-learnclassificationgrid-search

Scikit-learn classifier with custom scorer dependent on a training feature


I am trying to train a RandomForestClassifier with a custom scorer whose output needs to be dependent on one of the features.

The X dataset contains 18 features: X dataset

The y is the usual array of 0s and 1s: y_true

The RandomForestClassifier with custom scorer is used within a GridSearchCV instance: GridSearchCV(classifier, param_grid=[...], scoring=custom_scorer).

Custom scorer is defined via Scikit-learn function make_scorer: custom_scorer = make_scorer(custom_scorer_function, greater_is_better=True).

This framework is very straightforward if the custom_scorer_function is dependent only on y_true and y_pred. However in my case I need to define a scorer which makes use of one of the 18 features contained in the X dataset, i.e. depending on the values of y_pred and y_true the custom score will be a combination of them and the feature.

My question is how can I pass the feature into the custom_scorer_function given that its standard signature accepts y_true and y_pred?

I am aware it accepts extra **kwargs, but passing the entire feature array in this way doesn't solve the problem as this function is invoked for each couple of y_true and y_pred values (would need to extract the individual feature value corresponding to them to make this working, which I am not sure can be done).

I have tried to augment the y_true array packing that feature into it and unpacking it within the custom_scorer_function (1st column are the actual labels, 2nd columns are the feature values I need to calculate the custom scores): y_true_augmented

However doing so violates the requirements of the classifier of having a 1D labels array and triggers the following error.

ValueError: Unknown label type: 'continuous-multioutput'

Any help is much appreciated.

Thank you.


Solution

  • You can do something like this (note you have given no real code so this is barebones)

    X = [...]
    y = [...]
    
    def custom_scorer_function(y, y_pred, **kwargs):
       a_feature = X[:,1]
       # now have y, y_pred and the feature you want
    
    custom_scorer = make_scorer(custom_scorer_function, greater_is_better=True)
    ...