I was trying to do logistic lasso using glmnet package in R.
I found a method to find an optimal value of lambda on internet, but I don't know how to get the confusion matrix and plot the predicted probabilities.
Here, I made a simple data df
to demonstrate what I've tried.
df<-data.frame(date = c(20120324,20120329,20121216,20130216,20130725,20130729,20130930,20131015,20131124,20131225,
20140324,20140530,20140613,20140721,20140630,20150102,20150214,20150312,20150316,20150329),
temperature=c(35,36.5,34.3,37.8,39,40,34.5,35.9,35.8,36.1,37,35,36,36.3,37.8,38.1,39.2,34.5,34.9,35.2),
bmi=c(20,23,25,27,32,24,35,21,19,29,21,32,21,22,24,25,19,18,25,26),
asthma=c(1,1,0,1,0,0,1,1,0,0,1,1,1,0,1,0,0,1,1,0))
set.seed(101)
# Now Selecting 70% of data as sample from total 'n' rows of `df`
sample <- sample.int(n = nrow(df), size = floor(.7*nrow(df)), replace = F)
train <- df[sample, ]
test <- df[-sample, ]
x<-model.matrix(asthma ~ temperature+bmi,data=train)[,-1]
y<-as.matrix(train$asthma)
# Note alpha=1 for lasso only
library(glmnet)
glmmod <- glmnet(x, y, alpha=1, family="binomial")
# glmnet's approach: automated cross validation
cvfit = cv.glmnet(x, y)
# coeficients of the final model
coef_cv=coef(cvfit, s = "lambda.min")
Here, asthma
is a binary response variable for logistic lasso.
temperature
and bmi
are dependent variables.
In my code, I can get an optimal lambda for logistic lasso.
But, after that, I don't know how to get a confusion matrix and plot predicted probabilities. If possible, I want to change the cutoff value that maximizes both specificity and sensitivity and get the confusion matrix from that cutoff value.
I've seen some codes to do things that I mentioned using glm
but, I haven't seen codes for glmnet
.
I couldn't really get your example data to give any usable predictions, so here's an example that comes with the glmnet
package:
library(glmnet)
data(BinomialExample)
x <- BinomialExample$x
y <- BinomialExample$y
# Fit a model using `cv.glmnet`
cfit <- cv.glmnet(x, y, family = "binomial")
# Use your model to make predictions
predicted_probabilities <- predict(cfit,newx=x,type="response")
# You can decide on an optimal threshold to turn your
# predicted probabilities into classifications e.g.
threshold <- 0.5
predicted_classes <- ifelse(predicted_probabilities > threshold,1,0)
# You can then make a confusion matrix like this:
table(predicted_classes,y)
# Or if you don't need to inspect the probabilities and pick a threshold
# you can produce one directly from your model object like this:
confusion.glmnet(cfit,newx=x,newy=y)
Note that you may want to produce a confusion matrix using test data that is separate from the data you used to train your model.