ggplot2legendscalena

Why doesn't scale_shape_manual behave in the same way as scale_color_manual and scale_fill_manual?


Say I have a data frame like myiris below, where I want to just highlight the setosa species.

I don't want, however, the other species to show in the legend. For my convenience, I just made all the rest be NA in a new Highlight column.

I do the following:

data(iris)
library(ggplot2)
myiris <- data.frame(iris$Sepal.Length, iris$Petal.Length, Highlight=as.character(iris$Species))
names(myiris)[1:2] <- c('Sepal.Length', 'Petal.Length')
myiris$Highlight[myiris$Highlight!="setosa"] <- NA
myiris$Highlight <- factor(myiris$Highlight, levels="setosa")
plot_palette <- c("red","gray70")

P <- ggplot(myiris, aes(x=Sepal.Length, y=Petal.Length, color=Highlight)) +
     geom_point(pch=16, size=5, alpha=0.5) +
     scale_color_manual(values=plot_palette, breaks='setosa')
P

This produces the following plot, which is great and already what I expect;

p1

However, I would like the point shape as a function of Highlight as well, with setosa points filled, and NA points hollow.

I use scale_shape_manual in the same exact way I just used scale_color_manual:

P <- ggplot(myiris, aes(x=Sepal.Length, y=Petal.Length, color=Highlight, shape=Highlight)) +
     geom_point(size=5, alpha=0.5) +
     scale_color_manual(values=plot_palette, breaks='setosa') +
     scale_shape_manual(values=c(16,1), breaks='setosa')

However, I get:

Warning message: Removed 100 rows containing missing values or values outside the scale range (geom_point()).

And the plot produced is this:

p2

Why is the scale_shape_manual behavior different from its counterpart functions, and how to correct this to obtain what I need (color and shape as a function of Highlight with no NA group in the legend)?

EDIT

NAs in the Highlight column are indeed not the problem. You could try to accomplish the same using the original iris (instead of myiris) and the Species column (instead of Highlight), but the same problem occurs:

P <- ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species)) +
     geom_point(pch=16, size=5, alpha=0.5) +
     scale_color_manual(values=c('red','gray70','gray70'), breaks='setosa')

AND

P <- ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, color=Species, shape=Species)) +
     geom_point(size=5, alpha=0.5) +
     scale_color_manual(values=c('red','gray70','gray70'), breaks='setosa') +
     scale_shape_manual(values=c(16,1,1), breaks='setosa')

Solution

  • Basically scale_color_manual and scale_shape_manual work the same, i.e. in both cases will ggplot2 assign the na.value= to categories excluded from breaks= (due to the use of an unnamed vector of values=). In case of scale_color_manual the default na.value is "grey50" (the difference compared to "grey70" is hardly visible but you can see it using layer_data()) whereas it is NA in case of scale_shape_manual.

    Hence, one fix for your issue would be to explicitly set the na.value=:

    library(ggplot2)
    
    ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species, shape = Species)) +
      geom_point(size = 5, alpha = 0.5) +
      scale_color_manual(
        values = c("red", "gray70", "gray70"),
        breaks = "setosa",
        na.value = "gray70" # The default is "grey50"
      ) +
      scale_shape_manual(
        values = c(16, 1, 1), 
        breaks = "setosa",
        na.value = 1
      )