rlime

Is case number equivalent to data row number in Lime?


Just discovered the Lime package in R and still trying to fully understand the package. I'm stumped though the visualization using 'plot_features'

Please excuse my naivety.

My question is this, is the case number for each row sequential? In other words, is case 416 equivalent to row 416 in the data? If not, how do I know the row each case number is referring to? Plot of feature weights

Sample code to reproduce the image above:

library(MASS)
library(lime)
data(biopsy)
biopsy$ID <- NULL
biopsy <- na.omit(biopsy)
biopsy2 = data.frame(ID = 1:nrow(biopsy), biopsy)
names(biopsy2) <- c('ID','clump thickness', 'uniformity of cell size', 
                   'uniformity of cell shape', 'marginal adhesion',
                   'single epithelial cell size', 'bare nuclei', 
                   'bland chromatin', 'normal nucleoli', 'mitoses',
                   'class')
# Now we'll fit a linear discriminant model on all but 4 cases
set.seed(4)
test_set <- sample(seq_len(nrow(biopsy2)), 4)
prediction <- biopsy2$class
biopsy2$class <- NULL
model <- lda(biopsy2[-test_set, ], prediction[-test_set])
predict(model, biopsy2[test_set, ])
explainer <- lime(biopsy2[-test_set,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(biopsy2[test_set, ], explainer, n_labels = 1, n_features = 4)
plot_features(explanation, ncol = 1)

EDIT: Added an extra column to the biopsy table called ID


Solution

  • As you can see in explanation, in the plot we go case by case starting from the beginning:

    head(explanation[, 1:5])
          model_type case  label label_prob  model_r2
    1 classification  416 benign  0.9943635 0.5432439
    2 classification  416 benign  0.9943635 0.5432439
    3 classification  416 benign  0.9943635 0.5432439
    4 classification  416 benign  0.9943635 0.5432439
    5 classification    7 benign  0.9527375 0.6586789
    6 classification    7 benign  0.9527375 0.6586789
    

    However, since each case has multiple lines, it may be not a bad idea to know which lines to correspond do them. For that you may use

    which(416 == explanation$case)
    # [1] 1 2 3 4
    

    so that

    explanation[which(416 == explanation$case), 1:5]
    #       model_type case  label label_prob model_r2
    # 1 classification  416 benign  0.9949716 0.551287
    # 2 classification  416 benign  0.9949716 0.551287
    # 3 classification  416 benign  0.9949716 0.551287
    # 4 classification  416 benign  0.9949716 0.551287