rupsetrupsetplot

Removing Isolated Variables in UpSetR Plot


This is my situation:

library(UpSetR)

movies <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), header = TRUE, sep = ";")

upset(movies, sets = c("Action", "Adventure", "Comedy", "Drama", "Mystery",  "Thriller", "Romance", "War", "Western"), 
      order.by = "freq")

I would like to improve the plot by removing variables (genres) that are displayed alone, without any intersections with other variables.

How can I modify the code to remove these isolated variables as specified below?

enter image description here


Solution

  • You can filter them out of the data before you draw the plot. For example

    sets <- c("Action", "Adventure", "Comedy", "Drama", "Mystery",  "Thriller", "Romance", "War", "Western")
    
    # keep only rows with more than 1 value
    reduced_data <- movies[rowSums(movies[, sets]) > 1, ]
    # or with dplyr...
    # reduced_data <- movies %>% filter(rowSums(pick(all_of(sets)))>1)
    
    upset(reduced_data, sets = sets, 
          order.by = "freq")
    

    which gives you upset plot with no single groups