I am looking for a way to subset my input data so I can make a second upsetR plot that shows the resolution of the sample intersections that are <<100 (for example). As an example, I'm using the tidy_movies data from tidyverse and the ggupset documentation (https://github.com/const-ae/ggupset).
I've posted a 'photoshopped' version of the figure that I need to make.
library(tidyverse)
library(ggupset)
#tidy_movies
tidy_movies %>%
distinct(title, year, length, .keep_all=TRUE) %>%
ggplot(aes(x=Genres)) +
geom_bar() +
scale_x_upset(n_intersections = 20)
# + scale_x_continuous(limits = c(0,100)) ##This does not work when uncommented.
Ideally, want a figure that looks like this:
Another approach would be to figure out how to subset tidy_movies
class(tidy_movies)
# How could I create a new version of tidy_movies that isolates a specific set of combinations?
thoughts? suggestions?
You could create a new column comprising the pasted-together contents of the Genre
list column, group_by
this, and filter out any groups with n() > 100
:
library(tidyverse)
library(ggupset)
tidy_movies %>%
distinct(title, year, length, .keep_all = TRUE) %>%
mutate(gen = sapply(Genres, paste, collapse = " ")) %>%
group_by(gen) %>%
filter(n() < 100) %>%
ggplot(aes(x=Genres)) +
geom_bar() +
scale_y_continuous(limits = c(0, 200)) +
scale_x_upset(n_intersections = 8)