rconfusion-matrixglmnetlasso-regression

Get a confusion matrix , predicted probability plot, set a cutoff value using glmnet package in R


I was trying to do logistic lasso using glmnet package in R. I found a method to find an optimal value of lambda on internet, but I don't know how to get the confusion matrix and plot the predicted probabilities. Here, I made a simple data df to demonstrate what I've tried.

df<-data.frame(date = c(20120324,20120329,20121216,20130216,20130725,20130729,20130930,20131015,20131124,20131225,
                        20140324,20140530,20140613,20140721,20140630,20150102,20150214,20150312,20150316,20150329),
               temperature=c(35,36.5,34.3,37.8,39,40,34.5,35.9,35.8,36.1,37,35,36,36.3,37.8,38.1,39.2,34.5,34.9,35.2),
               bmi=c(20,23,25,27,32,24,35,21,19,29,21,32,21,22,24,25,19,18,25,26),
               asthma=c(1,1,0,1,0,0,1,1,0,0,1,1,1,0,1,0,0,1,1,0))

set.seed(101)
# Now Selecting 70% of data as sample from total 'n' rows of `df`  
sample <- sample.int(n = nrow(df), size = floor(.7*nrow(df)), replace = F)
train <- df[sample, ]
test  <- df[-sample, ]

x<-model.matrix(asthma ~ temperature+bmi,data=train)[,-1]
y<-as.matrix(train$asthma)

# Note alpha=1 for lasso only 
library(glmnet)
glmmod <- glmnet(x, y, alpha=1, family="binomial")

# glmnet's approach: automated cross validation
cvfit = cv.glmnet(x, y)

# coeficients of the final model
coef_cv=coef(cvfit, s = "lambda.min")

Here, asthma is a binary response variable for logistic lasso. temperatureand bmi are dependent variables. In my code, I can get an optimal lambda for logistic lasso. But, after that, I don't know how to get a confusion matrix and plot predicted probabilities. If possible, I want to change the cutoff value that maximizes both specificity and sensitivity and get the confusion matrix from that cutoff value.

I've seen some codes to do things that I mentioned using glm but, I haven't seen codes for glmnet.


Solution

  • I couldn't really get your example data to give any usable predictions, so here's an example that comes with the glmnet package:

    library(glmnet)
    data(BinomialExample)
    x <- BinomialExample$x
    y <- BinomialExample$y
    
    # Fit a model using `cv.glmnet`
    cfit <- cv.glmnet(x, y, family = "binomial")
    
    # Use your model to make predictions
    predicted_probabilities <- predict(cfit,newx=x,type="response")
    
    # You can decide on an optimal threshold to turn your
    # predicted probabilities into classifications e.g.
    threshold <- 0.5
    predicted_classes <- ifelse(predicted_probabilities > threshold,1,0)
    
    # You can then make a confusion matrix like this:
    table(predicted_classes,y)
    
    # Or if you don't need to inspect the probabilities and pick a threshold
    # you can produce one directly from your model object like this:
    confusion.glmnet(cfit,newx=x,newy=y)
    

    Note that you may want to produce a confusion matrix using test data that is separate from the data you used to train your model.