today I attempted to make a bootstrap to obtain the interval confidence of various different ML algorithm AUC.
I used my personal medical dataset with 61 features formatted liked this :
Age | Female |
---|---|
65 | 1 |
45 | 0 |
For exemple I used this type of algorithm :
X = data_sevrage.drop(['Echec_sevrage'], axis=1)
y = data_sevrage['Echec_sevrage']
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.25, random_state=0)
lr = LogisticRegression(C=10 ,penalty='l1', solver= 'saga', max_iter=500).fit(X_train,y_train)
score=roc_auc_score(y_test,lr.predict_proba(X_test)[:,1])
precision, recall, thresholds = precision_recall_curve(y_test, lr.predict_proba(X_test)[:,1])
auc_precision_recall = metrics.auc(recall, precision)
y_pred = lr.predict(X_test)
print('ROC AUC score :',score)
print('auc_precision_recall :',auc_precision_recall)
And finally, when I used the boostrap method to obtain the confidence interval (I take the code from other topic : How to compare ROC AUC scores of different binary classifiers and assess statistical significance in Python?)
def bootstrap_auc(clf, X_train, y_train, X_test, y_test, nsamples=1000):
auc_values = []
for b in range(nsamples):
idx = np.random.randint(X_train.shape[0], size=X_train.shape[0])
clf.fit(X_train[idx], y_train[idx])
pred = clf.predict_proba(X_test)[:, 1]
roc_auc = roc_auc_score(y_test.ravel(), pred.ravel())
auc_values.append(roc_auc)
return np.percentile(auc_values, (2.5, 97.5))
bootstrap_auc(lr, X_train, y_train, X_test, y_test, nsamples=1000)
I have this error :
"None of [Int64Index([21, 22, 20, 31, 30, 13, 22, 1, 31, 3, 2, 9, 9, 18, 29, 30, 31,\n 31, 16, 11, 23, 7, 19, 10, 14, 5, 10, 25, 30, 24, 8, 20],\n dtype='int64')] are in the [columns]"
I use this other method, and i have nearly the same error :
n_bootstraps = 1000
rng_seed = 42 # control reproducibility
bootstrapped_scores = []
rng = np.random.RandomState(rng_seed)
for i in range(n_bootstraps):
# bootstrap by sampling with replacement on the prediction indices
indices = rng.randint(0, len(y_pred), len(y_pred))
if len(np.unique(y_test[indices])) < 2:
# We need at least one positive and one negative sample for ROC AUC
# to be defined: reject the sample
continue
score = roc_auc_score(y_test[indices], y_pred[indices])
bootstrapped_scores.append(score)
print("Bootstrap #{} ROC area: {:0.3f}".format(i + 1, score))
'[6, 3, 12, 14, 10, 7, 9] not in index'
Can you help me please ? I tested many solutions but I have this error every time.
Thank you !
Bootstrap method for AUC confidence interval on machine learning algorithm.
The problem is solved ! It's just a format problem, the conversion in numpy format solve it. Thank you !