rdata-visualizationradar-chartcircos

Best approach to visualise presence/absence of events in multiple groups


I have a dataset where the presence/absence of mutations in 40 particular genes has been recorded comparing normal tissue (e.g. lung tissue) vs a tumour from that tissue (e.g. lung tumor) for twenty tissue types. I am struggling to find the best way to visualise this data.

A subset of the data:

Gene    Lung_Normal Lung_Cancer Skin_Normal Skin_Cancer Brain_Normal    Brain_Cancer
Gene_1  TRUE    TRUE    TRUE    TRUE    TRUE    TRUE
Gene_2  TRUE    TRUE    TRUE    TRUE    TRUE    TRUE
Gene_3  FALSE   TRUE    FALSE   FALSE   FALSE   FALSE
Gene_4  FALSE   FALSE   FALSE   FALSE   FALSE   FALSE
Gene_5  FALSE   TRUE    FALSE   FALSE   FALSE   TRUE
Gene_6  FALSE   FALSE   TRUE    TRUE    TRUE    TRUE
Gene_7  FALSE   FALSE   FALSE   TRUE    FALSE   FALSE
Gene_8  FALSE   FALSE   FALSE   TRUE    FALSE   TRUE
Gene_9  FALSE   TRUE    FALSE   FALSE   FALSE   FALSE
Gene_10 FALSE   FALSE   FALSE   TRUE    FALSE   TRUE

The key message we want to convey is that while the same 3-4 genes are often mutated in normal tissues, each tumor has many more additional genes mutated and there is more diversity in the tumors. I could just leave it as a table like this, but I would love to find a good way to visualise the information in a clear way.

I would like to try making a figure, like a circus plot, with a single circle with two rings representing all the data. The inner ring would be the normal tissues, the outer ring would be the cancer tissues, with each segment containing the relevant normal tissue on the inner ring and the relevant cancer tissue on the outer ring. Each gene would be colour coded and only shown if mutated. So for all normal tissues the segment would show 2-3 colours for the 2-3 mutated genes, while the outer cancer segment would show many more colour segments, representing the many more mutations.

However I have not found a plotting software that could create such a visualisation. Does anyone know of a way to make a visualisation like this? Even just pointing me towards an R package would be very helpful. I have looked into circos and radar plots but I have not found a package that can make the type of visualisation I have in mind, only showing the events that occur in each case.

If anyone thinks a different kind of visualisation could represent this data please let me know I would be happy to consider alternatives that represent the data with clarity.

Thank you in advance.


Solution

  • Not sure if this is what you're looking for, but I took a stab at it. Also, I'm not entirely sure from the description above what you want to do with the different types of cells - Lung, Skin, Brain? If this isn't what you're looking for, perhaps you could post a drawing of what the intended output should look like.

    In the picture below, the inner ring is normal cells and the outer ring is cancer cells. My answer here benefited from this post.

    ## Make the data
    tib <- tibble::tribble(
      ~Gene,    ~Lung_Normal, ~Lung_Cancer, ~Skin_Normal, ~Skin_Cancer, ~Brain_Normal,    ~Brain_Cancer,
    "Gene_1", TRUE    , TRUE    , TRUE    , TRUE    , TRUE    , TRUE,
    "Gene_2",   TRUE,     TRUE,     TRUE,     TRUE,     TRUE,     TRUE, 
    "Gene_3", FALSE   , TRUE    , FALSE   , FALSE   , FALSE   , FALSE,
    "Gene_4",   FALSE,    FALSE,    FALSE,    FALSE,    FALSE,    FALSE, 
    "Gene_5", FALSE   , TRUE    , FALSE   , FALSE   , FALSE   , TRUE,
    "Gene_6",   FALSE,    FALSE,    TRUE,     TRUE,     TRUE,     TRUE, 
    "Gene_7", FALSE   , FALSE   , FALSE   , TRUE    , FALSE   , FALSE,
    "Gene_8",   FALSE,    FALSE,    FALSE,    TRUE,     FALSE,    TRUE, 
    "Gene_9", FALSE   , TRUE    , FALSE   , FALSE   , FALSE   , FALSE,
    "Gene_10",  FALSE,    FALSE,    FALSE,    TRUE,     FALSE,    TRUE)
    
    library(tidyr)
    library(dplyr)
    
    ## Re-arrange into long format
    tib <- tib %>% 
      pivot_longer(cols=-Gene, names_pattern="(.*)_(.*)", names_to=c("type", ".value")) %>%  
      pivot_longer(c(Normal, Cancer), names_to = "diag", values_to="val") %>% 
      # code colors as the gene if it's mutated, otherwise Unmutated
      mutate(f = case_when(val ~ Gene, TRUE ~ "Unmutated")) %>% 
      group_by(Gene, f, diag) %>% 
      summarise(s = n()) %>% 
      mutate(diag = factor(diag, levels=c("Normal", "Cancer")), 
             f = factor(f, levels=c(paste("Gene", c(1,2,6,3,5,7,8,9,10,4), sep="_"), "Unmutated"))) 
    
    library(ggplot2)
    library(RColorBrewer)
    ggplot(tib, aes(x=diag, 
                    y = s, 
                    fill=f)) + 
      geom_bar(stat="identity") + 
      coord_polar("y") + 
      theme_void() + 
      scale_fill_manual(values=c(brewer.pal(9, "Paired"), "gray75")) + 
      labs(fill = "Mutations")
    

    enter image description here


    EDIT

    Here' is what it looks like with the data Allan suggested. This approach doesn't scale quite as well as the need for having lots of colors is going to make the plot less readable.

    df <- structure(list(genes = c("Gene1", "Gene2", "Gene3", "Gene4", 
    "Gene5", "Gene6", "Gene7", "Gene8", "Gene9", "Gene10", "Gene11", 
    "Gene12", "Gene13", "Gene14", "Gene15", "Gene16", "Gene17", "Gene18", 
    "Gene19", "Gene20", "Gene21", "Gene22", "Gene23", "Gene24", "Gene25", 
    "Gene26", "Gene27", "Gene28", "Gene29", "Gene30", "Gene31", "Gene32", 
    "Gene33", "Gene34", "Gene35", "Gene36", "Gene37", "Gene38", "Gene39", 
    "Gene40"), bone_cancer = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, 
    TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE), bone_normal = c(FALSE, 
    FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    TRUE, FALSE, TRUE), brain_cancer = c(TRUE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE), brain_normal = c(FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, 
    FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
    TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE), breast_cancer = c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
    TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, TRUE, FALSE, FALSE, FALSE), breast_normal = c(TRUE, 
    FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
    TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, 
    FALSE), colon_cancer = c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
    TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, TRUE, FALSE), colon_normal = c(FALSE, 
    TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, 
    TRUE, TRUE, FALSE), kidney_cancer = c(FALSE, FALSE, FALSE, FALSE, 
                                  FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, 
                                  TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, 
                                  FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
                                  FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE), 
    kidney_normal = c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, 
    TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, 
    TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, 
    TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE), liver_cancer = c(FALSE, 
    FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE), liver_normal = c(TRUE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, 
    TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, 
    TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, 
    FALSE), lung_cancer = c(TRUE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), 
    lung_normal = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, 
    TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), prostate_cancer = c(TRUE, 
    FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, 
    FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    TRUE, FALSE, TRUE), prostate_normal = c(TRUE, FALSE, FALSE, 
    FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE), skin_cancer = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, 
    TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE), skin_normal = c(TRUE, 
    FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, 
    TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, 
    TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, 
    FALSE, FALSE, FALSE), thyroid_cancer = c(FALSE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, 
    FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE), thyroid_normal = c(FALSE, FALSE, TRUE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE)), 
    class = "data.frame", row.names = c(NA, 40L))
    names(df)[1] <- "Gene"
    tib <- df %>% 
      pivot_longer(cols=-Gene, names_pattern="(.*)_(.*)", names_to=c("type", ".value")) %>%  
      pivot_longer(c(normal, cancer), names_to = "diag", values_to="val") %>% 
      # code colors as the gene if it's mutated, otherwise Unmutated
      mutate(f = case_when(val ~ Gene, TRUE ~ "Unmutated")) %>% 
      group_by(Gene, f, diag) %>% 
      summarise(s = n()) %>% 
      ungroup() %>% 
      group_by(Gene) %>% 
      mutate(diag = factor(diag, levels=c("normal", "cancer")))
             
    
    levs <- tib %>% 
      dplyr::select(f, s) %>% 
      summarise(pct_mutated = sum(s*(f!= "Unmutated"))/sum(s)) %>% 
      arrange(-pct_mutated)  %>% 
      dplyr::select(Gene) %>% 
      pull()
    
    
    tib<- tib %>% 
      mutate(f = factor(f, levels=c(levs, "Unmutated")))
    
    
    
    library(ggplot2)
    library(RColorBrewer)
    ggplot(tib, aes(x=diag, 
                    y = s, 
                    fill=f)) + 
      geom_bar(stat="identity") + 
      coord_polar("y") + 
      theme_void() + 
      scale_fill_manual(values=c(rainbow(length(levels(tib$f))-1), "gray75")) + 
      labs(fill = "Mutations")
    

    enter image description here