rriverplot

RiverPlot for biological Pathways


I am trying to plot a river plot to show how pathways are getting significant from disease stage-1 to stage-2 based on their p-value. I want to show the width of the flow of riverplot based on p-value in between two stages for the same pathway. For example, p1 abd p2 are two pathways and p1 is getting significant as disease progress from stage-1 to stage-2 (p-value changes from 0.8 to 0.02). Pathway p1 having 12 mutated genes in stage-1 and 9 mutated genes in stage-2. Similarly, pathway p2 was significant in stage-1 but not in stage-2. Pathway p2 having 10 mutated genes in stage-1 and 5 mutated genes in stage-2. This information is shown in the dataframe below:

pathway <- c('p1','p1','p2','p2')
disease <- c('Stage-2','Stage-1','Stage-2','Stage-1')
pval <- c(0.01,0.8,0.7,0.02)
ngenes <- c(9,12,5,10)
df <- data.frame(pathway, disease, pval,ngenes)

Now I am using ggalluvium to draw the river plot. The code is shown below:

library(ggalluvial)
ggplot(df,
   aes(x = disease, stratum = pathway, alluvium = pval,
       y = ngenes,
       fill = pathway, label = pathway)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("River Plot")

An the plot I am getting is:

River Plot

This is not what I was expecting. The flows (connection in between stage-1 and stage-2) are missing (whose width is based on p-value i.e. for p1, higher width for stage-1 and smaller width for stage-2 (or its reverse) ). Can anyone suggest how to add flow to this riverplot diagram.


Solution

  • I was missing a crucial information about riverplot which is the importance in between features. In my original dataframe, I was not giving the information properly. So I have made the following changes:

    pathway <- c('p1','p1','p2','p2')
    disease <- c('Stage-2','Stage-1','Stage-2','Stage-1')
    sub <- c(2,2,1,1)
    pval <- c(0.01,0.8,0.7,0.02)
    df <- data.frame(pathway, disease, pval,sub)
    
    library(ggalluvial)
     ggplot(df,
       aes(x = disease, stratum = pathway, alluvium = sub,
           y = pval,
           fill = pathway, label = pathway)) +
    scale_x_discrete(expand = c(.1, .1)) +
    geom_flow() +
    geom_stratum(alpha = .5) +
    geom_text(stat = "stratum", size = 3) +
    theme(legend.position = "none") +
    ggtitle("River Plot")
    

    With the above code, I got the following graph:

    River Plot

    This is exactly I was looking for. The pathway p1, which is getting significant while progressing from stage-1 to stage-2, is wider at stage-1 and width is getting smaller as it is becoming more significant in stage-2. The same information can be interpreted for pathway p2.