pythonmachine-learningscikit-learnclassificationmulticlass-classification

axis 1 is out of bounds for array of dimension 1 when using SVM


training data

df = pd.read_excel('C:/Users/Ram Prakash/Downloads/Data.xlsx', 
sheet_name = 'Multiclass')
X = df.drop('Fault Type', axis =1)
y = df.iloc[0:, 10]
y = le.fit_transform(y)

# Train Test split
X_train, X_test, y_train, y_test = train_test_split(X, y, 
test_size=0.3, random_state=2021)
scaler = StandardScaler().fit(X_train)
X_train, X_test = scaler.transform(X_train), 
scaler.transform(X_test)

I am trying to classify the data into multiple classes. However I keep getting this error.

Code:

##Classification by Default parameters
# Fit SVM classifier
clf_default = SVC(kernel='rbf')
clf = OneVsOneClassifier(clf_default).fit(X_train, y_train)
print('(Cross Validation) AUC Score:', np.mean(cross_val_score(estimator=clf, X=X_train, y=y_train, cv=5, scoring = 'roc_auc')))

# Show result
print('(Test set) Confusion Matrix:')
c = label_binarize(y_test, classes = labels)
print(confusion_matrix(y_test, clf.predict(X_test)))
print('(Test set) AUC Score:', roc_auc_score(y_test, clf.predict(X_test), average = 'macro', multi_class = 'ovo'))

While running the code I get the following error.

Traceback (most recent call last):
  File "C:\Users\Ram Prakash\AppData\Local\Temp\ipykernel_23060\2986435798.py", line 6, in <cell line: 5>
    print('(Test set) AUC Score:', roc_auc_score(y_test, clf_default.predict(X_test), average = None, multi_class = 'ovo'))
  File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 566, in roc_auc_score
    return _multiclass_roc_auc_score(
  File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 638, in _multiclass_roc_auc_score
    if not np.allclose(1, y_score.sum(axis=1)):
  File "C:\Anaconda\lib\site-packages\numpy\core\_methods.py", line 48, in _sum
    return umr_sum(a, axis, dtype, out, keepdims, initial, where)
numpy.AxisError: axis 1 is out of bounds for array of dimension 1

The error is showing in this respective line

print('(Test set) AUC Score:', roc_auc_score(y_test, clf.predict(X_test), average = 'macro', multi_class = 'ovo'))
Traceback (most recent call last):
File "C:\Users\Ram Prakash\AppData\Local\Temp\ipykernel_17052\3143009577.py", line 6, in <cell line: 5>
    print('(Test set) AUC Score:', roc_auc_score(y_test, clf_default.decision_function(X_test), average = 'macro', multi_class = 'ovo'))
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 566, in roc_auc_score
    return _multiclass_roc_auc_score(
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 639, in _multiclass_roc_auc_score
    raise ValueError(
ValueError: Target scores need to be probabilities for multiclass roc_auc, i.e. they should sum up to 1.0 over classes

I no longer see the above errors. However, the scoring fails and gives me AUC score as nan:

C:\Anaconda\lib\site-packages\sklearn\model_selection\_validation.py:794: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_scorer.py", line 115, in __call__
    score = scorer._score(cached_call, estimator, *args, **kwargs)
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_scorer.py", line 367, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

input

output

Output: (Cross Validation) AUC Score: nan

Traceback (most recent call last):
File "C:\Users\Ram Prakash\AppData\Local\Temp\ipykernel_30780\4160018733.py", line 6, in <cell line: 5>
    print('(Test set) AUC Score:', roc_auc_score(y_test, clf_default.predict_proba(x_test), average = 'macro', multi_class = 'ovo'))
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 566, in roc_auc_score
    return _multiclass_roc_auc_score(
File "C:\Anaconda\lib\site-packages\sklearn\metrics\_ranking.py", line 683, in _multiclass_roc_auc_score
    raise ValueError(
ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'

Solution

  • roc_auc_score() should take the class probabilities in the multiclass case (see here). You will need to train SVC with probability=True:

    clf_default = SVC(kernel='rbf', probability=True)
    

    Then use predict_proba() for predictions:

    print('(Test set) AUC Score:', roc_auc_score(y_test, clf.predict_proba(X_test), average = 'macro', multi_class = 'ovo'))