I am trying to plot a density plot for 3 variables over 7 different geographical points, but the output does not show as expected. N should be higher in the middle, but the other seem to plot the same pattern when it is not real, why is this? how could I fix it?
Variable1 <- c(rep("E",7), rep("N",7),rep("L",7))
Variable2 <- c(rep(1:7, 3))
value <- c(12.44035, 11.98035333, 11.40821, 12.15833, 13.14826, 11.99339667, 12.17363, 4.073096, 3.946134667, 6.244152, 5.76892, 4.545772, 3.580206667, 2.879470667, 3.6912875, 3.501247, 2.684179, 3.06306, 3.364774, 4.485021333, 3.373649333)
df <- data.frame(Variable1, Variable2, value)
library(ggridges)
ggplot(df, aes(x = Variable2, y = Variable1)) +
geom_density_ridges(aes(fill = Variable1))
You are calculating the density of your x-axis, which in your case is Variable 2
, the same thing (1,2,...,7
) for every Variable 1
, so it gives the same density.
So i think that you want your x-axis to be value
, and you actually don't need Variable 2
as it's a mere index.
ggplot(df, aes(x=value, y=Variable1)) +
geom_density_ridges(aes(fill=Variable1))
The geom you want actually is geom_line
, or geom_smooth
(for prettier graphs), or maybe geom_area
for filling the area under the curves.
Now, one way of doing it would be putting all the curves on the same y scale:
ggplot(df, aes(x=Variable2, y=value, color=Variable1)) +
geom_smooth(fill=NA)
But this doesn't give the separation that you wanted. To do that, the way i know is making a plot for each Variable1
, and arranging them together (but maybe there's an option with this package ggridges
, but i never used it). To do that we build a "base" graph:
g = ggplot(df, aes(x=Variable2, y=value)) +
geom_smooth(fill=NA) +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank())
Where we removed the x-axis to add only once in the grid. Then, we apply that base for each variable, one at a time, with a for loop:
for(i in unique(df$Variable1)){
df2 = df[df$Variable1==i,]
assign(i,
g %+% df2 + ylab(i) +
ylim(min(df2$value),max(df2$value)))}
This creates one graph for each Variable1
, named as the variable itself. Now we add the x-axis in the last plot and arrange them together:
N = N + theme(axis.text.x = element_text(),
axis.title.x = element_text())
gridExtra::grid.arrange(E,L,N, nrow=3)
Output:
To use colors, first we don't pass the geom
to g
:
g = ggplot(df, aes(x=Variable2, y=value)) +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank())
Then we create a vector of colors that we'll use in the loop:
color = c("red", "green", "blue")
names(color) = unique(df$Variable1)
Then we pass the color
argument inside the geom
that we omitted earlier.
But first, let me talk about the available geoms: We could use a smooth geom area, which will give something like this:
Which is good but has a lot of useless area under the graphs. To change that, we can use geom_ribbon
, where we can use the argument aes(ymin=min(value)-0.1, ymax=value)
and ylim(min(df2$value)-0.1, max(df2$value))
to stop the graph at the minimal value (minus 0.1). The problem is that the smoothing function of ggplot doesn't work well with geom_ribbon, so we only have the option of a "rough" graph:
Code for the smooth area:
for(i in unique(df$Variable1)){
df2 = df[df$Variable1==i,]
assign(i,
g %+% df2 + ylab(i) +
stat_smooth(geom="area", fill=color[i]))}
Code for the rough ribbon:
for(i in unique(df$Variable1)){
df2 = df[df$Variable1==i,]
assign(i,
g %+% df2 + ylab(i) + ylim(min(df2$value)-0.1,max(df2$value)) +
geom_ribbon(aes(ymax=value, ymin=min(value)-0.1), fill=color[i]))}
I searched for a way to work aroud that smotthing problem but foud nothing, i'll create a question in the site and if i find a solution i'll show it here!
After asking in here, i found that using after_stat
inside the aes
argument of stat_smooth(geom="ribbon", aes(...))
solves it (more info read the link).
for(i in unique(df$Variable1)){
df2 = df[df$Variable1==i,]
assign(i,
g %+% df2 + ylab(i) +
stat_smooth(geom="ribbon", fill=color[i],
aes(ymax=after_stat(value), ymin=after_stat(min(value))-0.1)))}