I was doing a logistic regression and made a table that represents the predicted probability ,actual class, and predicted class.
If the predicted probability is more than 0.5, I classified it as 1
,so the predicted class becomes 1
. But I want to change the threshold value from 0.5 to another value.
I was considering to find a threshold value that maximizes both true positive rate and true negative rate. Here I made a simple data df
to demonstrate what I want to do.
df<-data.frame(actual_class=c(0,1,0,0,1,1,1,0,0,1),
predicted_probability=c(0.51,0.3,0.2,0.35,0.78,0.69,0.81,0.31,0.59,0.12),
predicted_class=c(1,0,0,0,1,1,1,0,1,0))
If I can find a threshold value, I will classify using that value instead of 0.5. I don't know how to find a threshold value that both maximizes true positive rate and true negative rate.
You can check a range of values pretty easily:
probs <- seq(0, 1, by=.05)
names(probs) <- probs
results <- sapply(probs, function(x) df$actual_class == as.integer(df$predicted_probability > x))
results
is a 10 row by 21 column logical matrix showing when the predicted class equals the actual class:
colSums(results) # Number of correct predictions
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
5 5 5 4 5 5 4 6 6 6 6 7 8 8 7 7 6 5 5 5 5
predict <- as.integer(df$predicted_probability > .6)
xtabs(~df$actual_class+predict)
# predict
# df$actual_class 0 1
# 0 5 0
# 1 2 3
You can see that probabilities of .6 and .65 result in 8 correct predictions. This conclusion is based on the data you used in the analysis so it probably overestimates how successful you would be with new data.