I have a dataset where the presence/absence of mutations in 40 particular genes has been recorded comparing normal tissue (e.g. lung tissue) vs a tumour from that tissue (e.g. lung tumor) for twenty tissue types. I am struggling to find the best way to visualise this data.
A subset of the data:
Gene Lung_Normal Lung_Cancer Skin_Normal Skin_Cancer Brain_Normal Brain_Cancer
Gene_1 TRUE TRUE TRUE TRUE TRUE TRUE
Gene_2 TRUE TRUE TRUE TRUE TRUE TRUE
Gene_3 FALSE TRUE FALSE FALSE FALSE FALSE
Gene_4 FALSE FALSE FALSE FALSE FALSE FALSE
Gene_5 FALSE TRUE FALSE FALSE FALSE TRUE
Gene_6 FALSE FALSE TRUE TRUE TRUE TRUE
Gene_7 FALSE FALSE FALSE TRUE FALSE FALSE
Gene_8 FALSE FALSE FALSE TRUE FALSE TRUE
Gene_9 FALSE TRUE FALSE FALSE FALSE FALSE
Gene_10 FALSE FALSE FALSE TRUE FALSE TRUE
The key message we want to convey is that while the same 3-4 genes are often mutated in normal tissues, each tumor has many more additional genes mutated and there is more diversity in the tumors. I could just leave it as a table like this, but I would love to find a good way to visualise the information in a clear way.
I would like to try making a figure, like a circus plot, with a single circle with two rings representing all the data. The inner ring would be the normal tissues, the outer ring would be the cancer tissues, with each segment containing the relevant normal tissue on the inner ring and the relevant cancer tissue on the outer ring. Each gene would be colour coded and only shown if mutated. So for all normal tissues the segment would show 2-3 colours for the 2-3 mutated genes, while the outer cancer segment would show many more colour segments, representing the many more mutations.
However I have not found a plotting software that could create such a visualisation. Does anyone know of a way to make a visualisation like this? Even just pointing me towards an R package would be very helpful. I have looked into circos and radar plots but I have not found a package that can make the type of visualisation I have in mind, only showing the events that occur in each case.
If anyone thinks a different kind of visualisation could represent this data please let me know I would be happy to consider alternatives that represent the data with clarity.
Thank you in advance.
Not sure if this is what you're looking for, but I took a stab at it. Also, I'm not entirely sure from the description above what you want to do with the different types of cells - Lung, Skin, Brain? If this isn't what you're looking for, perhaps you could post a drawing of what the intended output should look like.
In the picture below, the inner ring is normal cells and the outer ring is cancer cells. My answer here benefited from this post.
## Make the data
tib <- tibble::tribble(
~Gene, ~Lung_Normal, ~Lung_Cancer, ~Skin_Normal, ~Skin_Cancer, ~Brain_Normal, ~Brain_Cancer,
"Gene_1", TRUE , TRUE , TRUE , TRUE , TRUE , TRUE,
"Gene_2", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
"Gene_3", FALSE , TRUE , FALSE , FALSE , FALSE , FALSE,
"Gene_4", FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
"Gene_5", FALSE , TRUE , FALSE , FALSE , FALSE , TRUE,
"Gene_6", FALSE, FALSE, TRUE, TRUE, TRUE, TRUE,
"Gene_7", FALSE , FALSE , FALSE , TRUE , FALSE , FALSE,
"Gene_8", FALSE, FALSE, FALSE, TRUE, FALSE, TRUE,
"Gene_9", FALSE , TRUE , FALSE , FALSE , FALSE , FALSE,
"Gene_10", FALSE, FALSE, FALSE, TRUE, FALSE, TRUE)
library(tidyr)
library(dplyr)
## Re-arrange into long format
tib <- tib %>%
pivot_longer(cols=-Gene, names_pattern="(.*)_(.*)", names_to=c("type", ".value")) %>%
pivot_longer(c(Normal, Cancer), names_to = "diag", values_to="val") %>%
# code colors as the gene if it's mutated, otherwise Unmutated
mutate(f = case_when(val ~ Gene, TRUE ~ "Unmutated")) %>%
group_by(Gene, f, diag) %>%
summarise(s = n()) %>%
mutate(diag = factor(diag, levels=c("Normal", "Cancer")),
f = factor(f, levels=c(paste("Gene", c(1,2,6,3,5,7,8,9,10,4), sep="_"), "Unmutated")))
library(ggplot2)
library(RColorBrewer)
ggplot(tib, aes(x=diag,
y = s,
fill=f)) +
geom_bar(stat="identity") +
coord_polar("y") +
theme_void() +
scale_fill_manual(values=c(brewer.pal(9, "Paired"), "gray75")) +
labs(fill = "Mutations")
EDIT
Here' is what it looks like with the data Allan suggested. This approach doesn't scale quite as well as the need for having lots of colors is going to make the plot less readable.
df <- structure(list(genes = c("Gene1", "Gene2", "Gene3", "Gene4",
"Gene5", "Gene6", "Gene7", "Gene8", "Gene9", "Gene10", "Gene11",
"Gene12", "Gene13", "Gene14", "Gene15", "Gene16", "Gene17", "Gene18",
"Gene19", "Gene20", "Gene21", "Gene22", "Gene23", "Gene24", "Gene25",
"Gene26", "Gene27", "Gene28", "Gene29", "Gene30", "Gene31", "Gene32",
"Gene33", "Gene34", "Gene35", "Gene36", "Gene37", "Gene38", "Gene39",
"Gene40"), bone_cancer = c(FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE,
TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE), bone_normal = c(FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
TRUE, FALSE, TRUE), brain_cancer = c(TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE), brain_normal = c(FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE), breast_cancer = c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE), breast_normal = c(TRUE,
FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE,
FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE,
FALSE), colon_cancer = c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE), colon_normal = c(FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, TRUE, FALSE), kidney_cancer = c(FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE),
kidney_normal = c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE,
TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE), liver_cancer = c(FALSE,
FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE), liver_normal = c(TRUE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE,
TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE,
FALSE), lung_cancer = c(TRUE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
lung_normal = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), prostate_cancer = c(TRUE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE,
FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, TRUE), prostate_normal = c(TRUE, FALSE, FALSE,
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE), skin_cancer = c(FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE), skin_normal = c(TRUE,
FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE,
FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE,
FALSE, FALSE, FALSE), thyroid_cancer = c(FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE,
FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE), thyroid_normal = c(FALSE, FALSE, TRUE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE)),
class = "data.frame", row.names = c(NA, 40L))
names(df)[1] <- "Gene"
tib <- df %>%
pivot_longer(cols=-Gene, names_pattern="(.*)_(.*)", names_to=c("type", ".value")) %>%
pivot_longer(c(normal, cancer), names_to = "diag", values_to="val") %>%
# code colors as the gene if it's mutated, otherwise Unmutated
mutate(f = case_when(val ~ Gene, TRUE ~ "Unmutated")) %>%
group_by(Gene, f, diag) %>%
summarise(s = n()) %>%
ungroup() %>%
group_by(Gene) %>%
mutate(diag = factor(diag, levels=c("normal", "cancer")))
levs <- tib %>%
dplyr::select(f, s) %>%
summarise(pct_mutated = sum(s*(f!= "Unmutated"))/sum(s)) %>%
arrange(-pct_mutated) %>%
dplyr::select(Gene) %>%
pull()
tib<- tib %>%
mutate(f = factor(f, levels=c(levs, "Unmutated")))
library(ggplot2)
library(RColorBrewer)
ggplot(tib, aes(x=diag,
y = s,
fill=f)) +
geom_bar(stat="identity") +
coord_polar("y") +
theme_void() +
scale_fill_manual(values=c(rainbow(length(levels(tib$f))-1), "gray75")) +
labs(fill = "Mutations")