I am trying to plot an intersection graph using UpSetR for an Orthogroup gene count dataset that looks like this -
I need to highlight certain intersections, the names 'Mdong', 'Mfastidious' etc, need to be in italics as they are names of bacteria and the y-axis title has to be changed to 'Number of genes shared' . I used the following code and was able to get the intersections but I don't know how to include the italics and y-axis title. I tried including ggplot2 as well but it did not work.
library(UpSetR)
library(ggplot2)
mydataframe <- data.frame(Orthogroup = c("OG0000000", "OG0000001", "OG0000002", "OG0000003", "OG0000004", "OG0000005", "OG0000006", "OG0000007", "OG0000008", "OG0000009", "OG0000010", "OG0000011", "OG0000012"), Mdong = c(9,3,7,0,3,0,5,3,8,14,0,4,6), Midriensis = c(6,8,9,0,6,0,5,6,6,4,0,8,9), Mcrassostreae = c(4,7,3,5,11,3,9,6,10,3,0,4,5))
selected_species <- colnames(mydataframe)[2:(ncol(mydataframe))]
All columns have to be integers, so -
mydataframe[mydataframe > 0] <- 1
Plot -
upset(mydataframe, sets = rev(selected_species), keep.order = T, order.by = "freq", queries=list(list(query = intersects, params = list("Mdong","Midriensis"), color = "red", active = T)), ylab("Number of genes shared"))
This seems to work well for having highlighted intersects but the 'ylab()' is ignored. I also checked examples -R ComplexUpset by krassowski but I guess I don't know how to use UpSetR or ComplexUpset properly. How do I include all this within a single command? Thanks in advance!
The supplied code wouldn't run for me, so I'm not sure if this is what you intended. Nonetheless, it does show how to highlight the names with italics (Mdong
in this example) and get a y-axis label. I've used ggupset because it plays nicely with ggplot.
library(tidyverse)
library(ggupset)
# to enable italics
library(ggtext)
mydataframe <-
data.frame(
Orthogroup = c(
"OG0000000",
"OG0000001",
"OG0000002",
"OG0000003",
"OG0000004",
"OG0000005",
"OG0000006",
"OG0000007",
"OG0000008",
"OG0000009",
"OG0000010",
"OG0000011",
"OG0000012"
),
Mdong = c(9, 3, 7, 0, 3, 0, 5, 3, 8, 14, 0, 4, 6),
Midriensis = c(6, 8, 9, 0, 6, 0, 5, 6, 6, 4, 0, 8, 9),
Mcrassostreae = c(4, 7, 3, 5, 11, 3, 9, 6, 10, 3, 0, 4, 5)
)
set_df <- mydataframe |>
pivot_longer(-Orthogroup) |>
distinct(value, name) |>
arrange(value, name) |>
mutate(
# select the names to be markdown formatted, e.g. italics
name = if_else(name %in% c("Mdong"), str_c("*", name, "*"), name),
gene = list(name),
.by = value
) |>
distinct(gene, value)
label_df <- set_df |> count(gene)
set_df |>
ggplot(aes(gene)) +
geom_bar() +
geom_label(aes(y = n, label = n), data = label_df) +
scale_x_upset() +
labs(
# with y-axis label
x = "Combinations", y = "Number of genes shared",
title = "Most Frequent Gene Combinations"
) +
# to enable italics
theme(axis.text.y = element_markdown())
Created on 2024-03-13 with reprex v2.1.0