rggplot2

How can I use variable measure (error bar) in ggplot?


I've like to insert error bar in my ggplot, but doesn't work. What'is the problem with my code below:

First simulate some data

set.seed(123)
data<-NULL
data$MeanDecreaseAccuracy <- rnorm(60,10)
data$Feature <- ifelse(data$MeanDecreaseAccuracy > 9, 
c("red"), c("blue"))
data$MeanDecreaseGini <- rpois(60,7)
data<-as.data.frame(data)
#

Create a ggplot with mean and error bar

Means calculated

res2<-aggregate(as.numeric(MeanDecreaseGini) ~ Feature , data, mean)
colnames(res2)<-c("Feature","MeanDecreaseGini")

Error bar calculated

st.err <- function(x, na.rm=FALSE) {
     if(na.rm==TRUE) x <- na.omit(x)
     sd(x)/sqrt(length(x))
     }
sd <- aggregate(as.numeric(MeanDecreaseGini) ~ Feature, data, st.err)
colnames(sd)<-c("Feature","MeanDecreaseGini")

Plot in ggplot

  ggplot(res2, aes(x = Feature, 
                         y = MeanDecreaseGini)) +
    geom_bar(stat='identity') +
    coord_flip() +
    theme_classic() +
    labs(
      x     = "Feature",
      y     = "Importance",
      title = "Feature Importance") + 
   geom_errorbar(aes(ymin=MeanDecreaseGini-sd, ymax=MeanDecreaseGini+sd))
#

Error: Columns `ymin`, `ymax` must be 1d atomic vectors or lists
In addition: Warning messages:
1: In Ops.factor(left, right) : ‘-’ not meaningful for factors
2: In Ops.factor(left, right) : ‘+’ not meaningful for factors

Solution

  • First, let's name the column you're trying to use sd, the way you use it in the code:

    colnames(sd)<-c("Feature","sd")
    

    Then we'll add the sd column to the data frame you are plotting:

    res2 = merge(res2, sd)
    

    Then your plot works fine:

    enter image description here

    You may want to adjust the color or width of the error bars.