Custom scale_color with gradient and manual value

I want to build a ggplot with a color gradient containing an exception.

For instance, say that I want the gradient to go from blue to black, but with the special value 21 being red.

This can be done using 2 geoms with data filtering:

library(tidyverse)
mtcars %>%
  ggplot(aes(x = qsec, y = wt, color = mpg)) +
  geom_point(size = 3, data=~filter(.x, mpg==21), color="red") +
  geom_point(size = 3, data=~filter(.x, mpg>21)) +
  scale_color_gradient(low="blue", high="black",
                        transform = "log")

ggplot output

^{Created on 2024-12-08 with reprex v2.1.1}

However, using this technique has annoying side effects, one being that the legend doesn't contain the red value.

Is there a way to achieve the same result with one standard geom?

Solution

[Corrected updated version 09.12.24] you can use scale_color_gradientn:

df %>%
      ggplot(aes(x = qsec, y = wt, color = !!sym(column_name))) +
      geom_point(size = 3) +
      scale_color_gradientn(
        colours = c("blue", "red", "black"),
        values = c(0, 0.5, 1),
        transform = "log"
      )

The colours are the colours used at the specific datapoints within the mpg column (0 = blue, 0.5 = red, 01 = blue). The scale_color_gradientn will then scale the colours in between these values.

Using this we could build a custom function highlight_color that takes a dataframe df and a column column_name to apply the colour ramp to. highlight_val marks the value which should be marked as red!

Note: If the highlighted value is near the max/min, then the colour red will applied +- threshold_to_highlightValue around this value.

library(tidyverse)
library(scales)
# Create a custom color scale function
highlight_color <- function(df, column_name, highlight_val = 21, threshold_to_highlightValue = 0.01) {
  # Convert column_name from string to actual column reference
  col <- df[[column_name]]
  
  # Ensure the column is numeric for calculations
  if (!is.numeric(col)) {
    stop("The column must be numeric.")
  }
  
  # Check if highlight_val exists in the column
  if (highlight_val %in% col) {
    # Calculate the normalized position of the highlight value in the scale
    pos_in_df <- (highlight_val - min(col)) / (max(col) - min(col))
    
    # Define color palette and corresponding values
    if (pos_in_df == 0) {
      colors <- c("red", "blue", "black")
      values <- c(0, threshold_to_highlightValue, 1)
    } else if (pos_in_df == 1) {
      colors <- c("black", "blue", "red")
      values <- c(0, 1 - threshold_to_highlightValue, 1)
    } else {
      colors <- c("cadetblue1", "blue", "red","blue", "black")
      values <- c(0, pos_in_df - threshold_to_highlightValue, pos_in_df, pos_in_df + threshold_to_highlightValue, 1)
    }
    
    # Create and return the plot with the customized gradient
    p <- df %>%
      ggplot(aes(x = qsec, y = wt, color = !!sym(column_name))) +
      geom_point(size = 3) +
      scale_color_gradientn(
        colours = colors,
        values = values,
        rescaler = scales::rescale,  # Ensures proper handling of normalized values
        limits = range(col)          # Ensure the color scale fits the data range
      ) +
      theme_minimal() +
      labs(color = column_name)  # Add dynamic label for the legend
    
    return(p)
  } else {
    # If highlight value is not in the column, return a message
    stop(paste0("Value ", highlight_val, " is not in column ", column_name))
  }
}

# Example usage
highlight_color(mtcars, "mpg", highlight_val = 21, threshold_to_highlightValue = 0.01)

which generates: