This is my code in python to calculate accuracy, precision, recall, and f1 score on K-Fold Cross Validation.
Here in my code I sum up every of my accuracy, recall, and so on. Then I divide it with n_folds. But I don't know if my formula is accurate to calculate those scores. How can I tell?
a=0
p=0
r=0
f=0
for fold in range(0, n_folds):
# splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =int(len(y)/n_folds))
clf.fit(X_train, y_train)
x_test_prediction = clf.predict(X_test)
a=a+accuracy_score(x_test_prediction, y_test)
p=p+precision_score(x_test_prediction, y_test)
r=r+recall_score(x_test_prediction, y_test)
f=f+f1_score(x_test_prediction, y_test)
accuracy_score=a
precision_score=p
recall_score=r
f1_score=f
print("accuracy score :",(accuracy_score)/n_folds)
print("precision score :",precision_score/n_folds)
print("recall score :",recall_score/n_folds)
print("f1 score :",f1_score/n_folds)
There is a function to handle cross validation for you: cross_validate
. However, your method seems correct.
Note that it is not a good idea to use your entire data set to build your model. You can check the documentation about evaluate estimator performance:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_validate
n_folds = 5
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = LogisticRegression(max_iter=1000, random_state=42)
scoring = ['accuracy', 'precision_macro', 'recall_macro', 'f1_macro']
scores = cross_validate(clf, X_train, y_train, cv=n_folds, scoring=scoring, return_train_score=True)
df_scores = pd.DataFrame(scores)
Output:
>>> df_scores
fit_time score_time test_accuracy train_accuracy test_precision_macro train_precision_macro test_recall_macro train_recall_macro test_f1_macro train_f1_macro
0 0.012872 0.004308 1.000000 0.958333 1.000000 0.959477 1.000000 0.958333 1.000000 0.958293
1 0.009851 0.004276 1.000000 0.968750 1.000000 0.969281 1.000000 0.968394 1.000000 0.968681
2 0.009777 0.003775 0.875000 1.000000 0.909091 1.000000 0.875000 1.000000 0.870445 1.000000
3 0.009764 0.004038 1.000000 0.979167 1.000000 0.979798 1.000000 0.979798 1.000000 0.979167
4 0.010602 0.003765 0.958333 0.968750 0.962963 0.968750 0.958333 0.969045 0.958170 0.968742
Check other predefined scoring values