Problem
I am trying to use scikit-learn's LogisticRegressionCV
with roc_auc_score
as the scoring metric.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
clf = LogisticRegressionCV(scoring=roc_auc_score)
But when I attempt to fit the model (clf.fit(X, y)
), it throws an error.
ValueError: average has to be one of (None, 'micro', 'macro', 'weighted', 'samples')
That's cool. It's clear what's going on: roc_auc_score
needs to be called with the average
argument specified, per its documentation and the error above. So I tried that.
clf = LogisticRegressionCV(scoring=roc_auc_score(average='weighted'))
But it turns out that roc_auc_score
can't be called with an optional argument alone, because this throws another error.
TypeError: roc_auc_score() takes at least 2 arguments (1 given)
Question
Any thoughts on how I can use roc_auc_score
as the scoring metric for LogisticRegressionCV
in a way that I can specify an argument for the scoring function?
I can't find an SO question on this issue or a discussion of this issue in scikit-learn's GitHub repo, but surely someone has run into this before?
I found a way to solve this problem!
scikit-learn offers a make_scorer
function in its metrics
module that allows a user to create a scoring object from one of its native scoring functions with arguments specified to non-default values (see here for more information on this function from the scikit-learn docs).
So, I created a scoring object with the average
argument specified.
roc_auc_weighted = sk.metrics.make_scorer(sk.metrics.roc_auc_score, average='weighted')
Then, I passed that object in the call to LogisticRegressionCV
and it ran without any issues!
clf = LogisticRegressionCV(scoring=roc_auc_weighted)