I am trying to produce a plot in ggplot for multinomial logistic regression. Not all levels of my nominal dependent variable are observed in each factor level. I want a plot that has even width of bars. I can get the mean of each factor to show up using geom_bar with even width bars once I use the position_dodge(preserve='single')
code, but I cannot get the geom_point
to align the same.
Here is my data and decide is the nominal dependent variable:
decide=c("h", "g", "h", "g", "h", "g", "g", "h", "g", "h", "g", "h", "h", "h", "h", "h", "g", "h", "h", "r", "g", "h", "h", "h", "g", "g", "g", "h", "h", "h","h", "h", "h", "r", "h", "g", "g", "h", "g", "h", "g", "h", "g", "h", "d", "h", "h", "r", "h", "h", "g", "g", "g", "h", "g", "g", "g", "g", "h", "h")
dcsz=c("small", "medium", "small", "small", "medium", "small", "small", "medium", "medium", "small", "small", "medium", "small", "medium", "small", "medium", "small", "medium", "small", "small", "medium", "small", "medium", "medium", "medium", "small", "small", "medium", "small", "medium", "small", "medium", "small", "medium", "medium", "medium", "small", "medium", "medium", "small", "medium", "small", "medium", "medium", "small", "small", "medium", "small", "medium", "medium", "medium", "small", "small", "small", "small", "medium", "medium", "small", "small", "medium")
disthome=c(9.2,10.0,5.0,0.8,6.5,2.0,6.8,1.6,6.9,4.4,5.8,6.2,4.7,0.6,3.0,4.7,5.8,1.5,5.8,4.5,3.2,4.6,2.9,4.1,6.5,4.8,9.1,4.7,4.3,4.2,4.8,3.5,5.4,7.1,3.0,5.3,1.0,5.2,2.2,1.7,6.0,6.1,3.1,2.4,4.3,5.1,7.2,9.8,6.9,3.1,8.8,0.9,9.7,2.2,5.4,4.4,6.8,8.3,5.4,2.2)
gohome=data.frame(decide, dcsz, disthome)
Here is how I got the mean and standard error:
gohome.disthome <- gohome %>%
group_by(dcsz,decide) %>%
summarise(meandisthome = mean(na.omit(disthome)),
sedisthome=sd(na.omit(disthome))/sqrt(n()))
Now to the nitty gritty: Here is my original code before I managed to align the error bars with the means bar and separated the points into nominal variables:
ggplot(gohome,aes(y=disthome, x=dcsz, fill = decide)) +
#add bars and the preserve part keeps all bars same width
geom_bar(stat="identity", position=position_dodge(),
data=gohome.disthome,aes(x=dcsz,y=meandisthome))
#overlay data points
geom_point(position=position_dodge()) +
#add error bars of means
geom_errorbar(data=gohome.disthome,stat="Identity",
position=position_dodge(),
aes(x=dcsz, fill = decide,y=meandisthome,
ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome),
width=0.3)+
#flip axis
coord_flip()
Here is the code where I got the error bars to align with the mean bars (using 0.9 in position_dodge
), separated the points into nominal variable (0.9), and also got the error bars and mean bars to all be the same width even though the levels of the dependent variable were not all observed in each factor level (I added preserve="single"
in position_dodge
). I cannot add preserve='single'
into the geom_point
otherwise it does not separate the points by nominal variable, and using preserve='total'
doesn't do anything either:
ggplot(gohome,aes(y=disthome, x=dcsz, fill = decide)) +
#add bars and the preserve part keeps all bars same width
geom_bar(stat="identity",position=position_dodge(preserve='single'),
data=gohome.disthome,aes(x=dcsz,y=meandisthome))+
#overlay data points
geom_point(position=position_dodge(0.9)) +
#add error bars of means
geom_errorbar(data=gohome.disthome,stat="Identity",
position=position_dodge(0.9,preserve = "single"),
aes(x=dcsz, fill = decide,y=meandisthome,
ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome),
width=0.3)+
#flip axis
coord_flip()
I've also tried using position_dodge2
instead of position_dodge
for different combos and preserve='total'
, but that doesn't solve it either. Either the points stay the say or they become a complete scatter with no separation. I had the idea to use position_dodge2
and preserve='total'
from the following link since my problem is very similar (not sure why mine isn't working): https://github.com/tidyverse/ggplot2/issues/2712
Can someone please help me fix my code? I need to points to line up perfectly for all error bars.
The issue is that you missed to set the grouping variable in geom_errobar
and geom_point
. From the docs:
position_dodge() requires the grouping variable to be be specified in the global or geom_* layer.
Try this:
library(dplyr)
library(ggplot2)
ggplot(gohome,aes(y=disthome, x=dcsz)) +
#add bars and the preserve part keeps all bars same width
geom_bar(stat="identity",
position=position_dodge(),
data=gohome.disthome,
aes(x=dcsz, y=meandisthome, fill = decide)) +
#overlay data points
geom_point(aes(group = decide), position=position_dodge(width = 0.9)) +
#add error bars of means
geom_errorbar(data=gohome.disthome,stat="Identity",
position=position_dodge(width = 0.9),
aes(x=dcsz,
group = decide,
y=meandisthome,ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome), width = 0.5)+
#flip axis
coord_flip()
EDIT After a lot of googling and checking out several combos the best solution I can come up with to get bars of the same width is to simply fill up the dataframe using tidyr::complete(decide, dcsz)
.
gohome <- data.frame(decide,dcsz,disthome) %>%
tidyr::complete(decide, dcsz)
gohome.disthome <- gohome %>% group_by(dcsz,decide) %>%
summarise(meandisthome = mean(na.omit(disthome)), sedisthome=sd(na.omit(disthome))/sqrt(n()))
#> `summarise()` regrouping output by 'dcsz' (override with `.groups` argument)
ggplot(gohome,aes(y=disthome, x=dcsz)) +
#add bars and the preserve part keeps all bars same width
geom_bar(stat="identity",
position=position_dodge(),
data=gohome.disthome,
aes(x=dcsz, y=meandisthome, fill = decide)) +
#overlay data points
geom_point(aes(group = decide), position=position_dodge(width = 0.9)) +
#add error bars of means
geom_errorbar(data=gohome.disthome,stat="Identity",
position=position_dodge(width = 0.9),
aes(x=dcsz,
group = decide,
y=meandisthome,ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome), width = 0.5)+
#flip axis
coord_flip()
Created on 2020-06-29 by the reprex package (v0.3.0)