Just discovered the Lime package in R and still trying to fully understand the package. I'm stumped though the visualization using 'plot_features'
Please excuse my naivety.
My question is this, is the case number for each row sequential? In other words, is case 416 equivalent to row 416 in the data? If not, how do I know the row each case number is referring to?
Sample code to reproduce the image above:
library(MASS)
library(lime)
data(biopsy)
biopsy$ID <- NULL
biopsy <- na.omit(biopsy)
biopsy2 = data.frame(ID = 1:nrow(biopsy), biopsy)
names(biopsy2) <- c('ID','clump thickness', 'uniformity of cell size',
'uniformity of cell shape', 'marginal adhesion',
'single epithelial cell size', 'bare nuclei',
'bland chromatin', 'normal nucleoli', 'mitoses',
'class')
# Now we'll fit a linear discriminant model on all but 4 cases
set.seed(4)
test_set <- sample(seq_len(nrow(biopsy2)), 4)
prediction <- biopsy2$class
biopsy2$class <- NULL
model <- lda(biopsy2[-test_set, ], prediction[-test_set])
predict(model, biopsy2[test_set, ])
explainer <- lime(biopsy2[-test_set,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(biopsy2[test_set, ], explainer, n_labels = 1, n_features = 4)
plot_features(explanation, ncol = 1)
EDIT: Added an extra column to the biopsy table called ID
As you can see in explanation
, in the plot we go case by case starting from the beginning:
head(explanation[, 1:5])
model_type case label label_prob model_r2
1 classification 416 benign 0.9943635 0.5432439
2 classification 416 benign 0.9943635 0.5432439
3 classification 416 benign 0.9943635 0.5432439
4 classification 416 benign 0.9943635 0.5432439
5 classification 7 benign 0.9527375 0.6586789
6 classification 7 benign 0.9527375 0.6586789
However, since each case has multiple lines, it may be not a bad idea to know which lines to correspond do them. For that you may use
which(416 == explanation$case)
# [1] 1 2 3 4
so that
explanation[which(416 == explanation$case), 1:5]
# model_type case label label_prob model_r2
# 1 classification 416 benign 0.9949716 0.551287
# 2 classification 416 benign 0.9949716 0.551287
# 3 classification 416 benign 0.9949716 0.551287
# 4 classification 416 benign 0.9949716 0.551287