I am trying to plot a river plot to show how pathways are getting significant from disease stage-1 to stage-2 based on their p-value. I want to show the width of the flow of riverplot based on p-value in between two stages for the same pathway. For example, p1 abd p2 are two pathways and p1 is getting significant as disease progress from stage-1 to stage-2 (p-value changes from 0.8 to 0.02). Pathway p1 having 12 mutated genes in stage-1 and 9 mutated genes in stage-2. Similarly, pathway p2 was significant in stage-1 but not in stage-2. Pathway p2 having 10 mutated genes in stage-1 and 5 mutated genes in stage-2. This information is shown in the dataframe below:
pathway <- c('p1','p1','p2','p2')
disease <- c('Stage-2','Stage-1','Stage-2','Stage-1')
pval <- c(0.01,0.8,0.7,0.02)
ngenes <- c(9,12,5,10)
df <- data.frame(pathway, disease, pval,ngenes)
Now I am using ggalluvium
to draw the river plot. The code is shown below:
library(ggalluvial)
ggplot(df,
aes(x = disease, stratum = pathway, alluvium = pval,
y = ngenes,
fill = pathway, label = pathway)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("River Plot")
An the plot I am getting is:
This is not what I was expecting. The flows (connection in between stage-1 and stage-2) are missing (whose width is based on p-value i.e. for p1, higher width for stage-1 and smaller width for stage-2 (or its reverse) ). Can anyone suggest how to add flow to this riverplot diagram.
I was missing a crucial information about riverplot which is the importance in between features. In my original dataframe, I was not giving the information properly. So I have made the following changes:
pathway <- c('p1','p1','p2','p2')
disease <- c('Stage-2','Stage-1','Stage-2','Stage-1')
sub <- c(2,2,1,1)
pval <- c(0.01,0.8,0.7,0.02)
df <- data.frame(pathway, disease, pval,sub)
library(ggalluvial)
ggplot(df,
aes(x = disease, stratum = pathway, alluvium = sub,
y = pval,
fill = pathway, label = pathway)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("River Plot")
With the above code, I got the following graph:
This is exactly I was looking for. The pathway p1, which is getting significant while progressing from stage-1 to stage-2, is wider at stage-1 and width is getting smaller as it is becoming more significant in stage-2. The same information can be interpreted for pathway p2.