rggplot2

if statement to check for the presence for a dplyr column name


I have written the following function where when I pass the argument color_col the plot should color each of the traces with a seperate color. In this instance its the weekend column which has 3 unique values : "weekday", "sat" , "sun".

My dataframe has columns : "episode_start_date" , "recourd_count" , "dow" and "weekend"

But the issue is more to do with this line of code where the if statement takes a column name and spits out an error "object weekend not found". From what I understand this is because weekend is not a variable found in the environment but rather just a column name.

Is there a fix for this

  if (!is.null(color_col)){
    p <- p + aes(color = color_col)
  }

Full function is here

get_plots <- function(df, col_name, xlab, ylab, color_col = NULL){
  p <- ggplot(df,aes({{col_name}},record_count)) 
  #if (!is.null(color_col)){
    #p <- p + aes(color = color_col)
  #}
  p <- p +
  geom_point(shape=21,colour="black",fill="white") +
  stat_smooth(method=lm,formula=y~ns(x,40),colour="#D7153A") +
  labs(x = xlab,
       y = ylab)+
  theme_classic()
  
  return(p)
}

Function call

get_plots(
  df = hosp_state, # symbol
  col_name = episode_start_date, 
  xlab = "Date of episode start", # x axis label
  ylab = "Number of hospitalisations", # y axis label
  color_col = weekend # symbol 
) + scale_color_manual(values = c("#002664","#cebfff","#00aa45"))

Solution

  • Lots of inference here, I'm assuming you're using symbols instead of strings. While I suggest that using symbols just so that you don't need to use dquotes is a little lazy and a lot more liability in code-maintenance and corner-cases like this than is normally justified.

    I think you're doing something like this:

    fun <- function(df,y=NULL,col=NULL) {
      a <- if (is.null(col)) {
        aes(cyl, {{y}}, color={{col}})
      } else aes(cyl, {{y}})
      ggplot(df, a) + geom_point()
    }
    fun(mtcars, disp) # works
    fun(mtcars, disp, gear)
    # Error in ccbr() (from #2) : object 'gear' not found
    

    The core issue is that when col is set as a lazy symbol (as it is here), then it doesn't produce an error until something tries to find it, and is.null tries just that. One way is to use tryCatch(.) which looks for this, such as

    fun <- function(df,y=NULL,col=NULL) {
      a <- if (tryCatch(is.null(col), error=function(ign) TRUE)) {
        aes(cyl, {{y}}, color={{col}})
      } else aes(cyl, {{y}})
      ggplot(df, a) + geom_point()
    }
    fun(mtcars, disp, gear) # now works
    
    names(mtcars)
    #  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"
    fun(mtcars, disp, mpg)  # no error but `mtcars$mpg` is not used as the color variable
    

    Noting that mpg is confirmed to be a column name in the dataset but it does not produce a color scale. Why? Because there is no attempt to encourage or even enforce ggplot2 evaluation of the column indicated by col= as a column name from df.

    Perhaps the simplest and least-ambiguous way is to use strings and ggplot's .data special pronoun.

    fun <- function(df, y, col=NULL) {
      a <- aes(x = cyl, y = .data[[y]], color = if (!is.null(col)) .data[[col]])
      ggplot(df, a) + geom_point()
    }
    fun(mtcars, "disp")
    fun(mtcars, "disp", "mpg")