training data
df = pd.read_excel('C:/Users/Ram Prakash/Downloads/Data.xlsx',
sheet_name = 'Multiclass')
X = df.drop('Fault Type', axis =1)
y = df.iloc[0:, 10]
y = le.fit_transform(y)
# Train Test split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=2021)
scaler = StandardScaler().fit(X_train)
X_train, X_test = scaler.transform(X_train),
scaler.transform(X_test)
I am trying to classify the data into multiple classes. However I keep getting this error.
Code:
##Classification by Default parameters
# Fit SVM classifier
clf_default = SVC(kernel='rbf')
clf = OneVsOneClassifier(clf_default).fit(X_train, y_train)
print('(Cross Validation) AUC Score:', np.mean(cross_val_score(estimator=clf, X=X_train, y=y_train, cv=5, scoring = 'roc_auc')))
# Show result
print('(Test set) Confusion Matrix:')
c = label_binarize(y_test, classes = labels)
print(confusion_matrix(y_test, clf.predict(X_test)))
print('(Test set) AUC Score:', roc_auc_score(y_test, clf.predict(X_test), average = 'macro', multi_class = 'ovo'))
While running the code I get the following error.
Traceback (most recent call last):
File "C:\Users\Ram Prakash\AppData\Local\Temp\ipykernel_23060\2986435798.py", line 6, in <cell line: 5>
print('(Test set) AUC Score:', roc_auc_score(y_test, clf_default.predict(X_test), average = None, multi_class = 'ovo'))
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 566, in roc_auc_score
return _multiclass_roc_auc_score(
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 638, in _multiclass_roc_auc_score
if not np.allclose(1, y_score.sum(axis=1)):
File "C:\Anaconda\lib\site-packages\numpy\core\_methods.py", line 48, in _sum
return umr_sum(a, axis, dtype, out, keepdims, initial, where)
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
The error is showing in this respective line
print('(Test set) AUC Score:', roc_auc_score(y_test, clf.predict(X_test), average = 'macro', multi_class = 'ovo'))
Traceback (most recent call last):
File "C:\Users\Ram Prakash\AppData\Local\Temp\ipykernel_17052\3143009577.py", line 6, in <cell line: 5>
print('(Test set) AUC Score:', roc_auc_score(y_test, clf_default.decision_function(X_test), average = 'macro', multi_class = 'ovo'))
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 566, in roc_auc_score
return _multiclass_roc_auc_score(
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 639, in _multiclass_roc_auc_score
raise ValueError(
ValueError: Target scores need to be probabilities for multiclass roc_auc, i.e. they should sum up to 1.0 over classes
I no longer see the above errors. However, the scoring fails and gives me AUC score as nan
:
C:\Anaconda\lib\site-packages\sklearn\model_selection\_validation.py:794: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_scorer.py", line 115, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_scorer.py", line 367, in _score
raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported
Output:
(Cross Validation) AUC Score: nan
Traceback (most recent call last):
File "C:\Users\Ram Prakash\AppData\Local\Temp\ipykernel_30780\4160018733.py", line 6, in <cell line: 5>
print('(Test set) AUC Score:', roc_auc_score(y_test, clf_default.predict_proba(x_test), average = 'macro', multi_class = 'ovo'))
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 566, in roc_auc_score
return _multiclass_roc_auc_score(
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 683, in _multiclass_roc_auc_score
raise ValueError(
ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'
roc_auc_score()
should take the class probabilities in the multiclass case (see here). You will need to train SVC with probability=True
:
clf_default = SVC(kernel='rbf', probability=True)
Then use predict_proba()
for predictions:
print('(Test set) AUC Score:', roc_auc_score(y_test, clf.predict_proba(X_test), average = 'macro', multi_class = 'ovo'))