I am trying to figure out how the F1_Score function in the MLmetrics library works when the y_pred values are non-binary.
For example:
library(MLmetrics)
y <- c(1,1,1,1,1,0,0,0,0,0)
x <- c(1, 0.8, 0.654, 0.99, 0.75, 0.1, 0.3, 0.6, 0.05, 0.2)
x_preds <- ifelse(x < 0.5, 0, 1)
getF1 <- F1_Score(y_true=y, y_pred=x, positive="1")
getF2 <- F1_Score(y_true=y, y_pred=x_preds, positive="1")
print(getF1)
print(getF2)
Gives getF1=0.3333333
and getF2 = 0.9090909
The example provided in the R documentation for the function is designed to calculate what I have called getF2, where I have specified exactly how to assign the probability scores to either class label based on a 0.5 threshold. What I am unclear on is how it calculates the F1 score if this threshold is not specified (getF1). Can anyone explain what the function does by default if you leave the probability scores as is and don't cast them to binary before calling the F1_Score function? I can't for the life of me figure out how it got 0.3333333.
Here is the function simplified:
f1_score <- function(y_true, y_pred, positive = '1'){
tt <- table(y_true, y_pred)
TP <- tt[positive, positive]
FP <- tt[rownames(tt)!=positive, positive]
FN <- sum(tt[positive, colnames(tt) !=positive])
precision <- TP/(TP+FP)
recall <- TP/(TP+FN)
2 * (precision * recall) / (precision + recall)
}
f1_score(y, x, "1")
[1] 0.3333333
f1_score(y, x_preds, "1")
[1] 0.9090909
Notice how the FP
is not summed since we assume there are only two categories in y_true
. If more categories, ie non-binary, then use FP <- sum(....)