rplotlyboxplot

colouring lines around plotly box plot


I am creating a grouped boxplot in plotly using predefined quantiles. I want to color the lines around the boxes based on a separate variable. I can't seem to do this directly in my plotly call. There is a nice solution here which involves changing the line colours as a post-processing step using plotly_build. The example in that link works well but the structure of the data is different when using pre-defined quantiles, and I can't seem to access the data as in that example. Perhaps there is a way but I can't figure it out.

My attempted solution is the following, which involves adding a new trace, but with transparent fill as follows:

library(plotly)
library(dplyr)

# CREATE DUMMY DATA
set.seed(123) # Set seed for reproducibility

site_name <- rep(paste0("site_", 1:5), each = 40) # Create the site_name column with 5 different site names, each with 20 rows
site_type <- rep(c("A", "B"), each = 20, times = 5) # Create the site_type column with 10 'A's and 10 'B's for each site
value <- runif(100, min = 0, max = 200) # Create the value column with random numbers (between 0 and 100).
# Combine into a data frame
df <- data.frame(site_name, site_type, value)

# Generate site_status randomly for each combination of site_name and site_type
unique_combinations <- unique(df[c("site_name", "site_type")])
unique_combinations$site_status <- sample(c("Good", "Bad"), nrow(unique_combinations), replace = TRUE)
# Merge site_status back to the original df
df <- df %>%
  left_join(unique_combinations, by = c("site_name", "site_type"))

# Display the first few rows of the dataset
head(df, 20)

# MAKE SUMMARY DATA

# Group by site_name and site_type, then calculate summary statistics
stats_df <- df %>%
  group_by(site_name, site_type) %>%
  summarise(
    lower_fence = quantile(value, probs = c(0.05), type = 5, na.rm = TRUE),
    q1 = quantile(value, probs = c(0.25), type = 5, na.rm = TRUE),
    median = quantile(value, probs = c(0.5), type = 5, na.rm = TRUE),
    mean = mean(value, na.rm = TRUE),
    q3 = quantile(value, probs = c(0.75), type = 5, na.rm = TRUE),
    upper_fence = quantile(value, probs = c(0.95), type = 5, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    site_status = unique(site_status),
    .groups = 'drop'
  )

# PLOTTING CODE

# make box plot
fig <- plot_ly(data = stats_df, 
               x = ~site_name, 
               color = ~site_type,   # boxes
               colors = c("blue","red"), 
               type = "box",
               lowerfence = ~lower_fence, 
               q1 = ~q1, 
               median = ~median,
               q3 = ~q3, 
               upperfence = ~upper_fence) %>%
  layout(boxmode = "group", boxgap = 1/5)

# Filter out the boxes to be drawn with green boxes around them
bad_data <- stats_df %>% filter(site_status == "Bad")

# Add green boxes
fig <- fig %>% plotly::add_trace(
  x = factor(bad_data$site_name),
  color = factor(bad_data$site_type),
  colors = c("blue","red"),
  type = "box",
  lowerfence = bad_data$lower_fence,
  q1 = bad_data$q1,
  median = bad_data$median,
  q3 = bad_data$q3,
  upperfence = bad_data$upper_fence,
  line = list(color = "green"),
  fillcolor = "rgba(255,0,0,0.0)", # Red with transparency
  boxmean = FALSE, # Avoid adding box means
  showlegend = TRUE,
  inherit = FALSE
)

# Show the figure
fig

This works like a charm when the data aren't grouped, but unfortunately when using a grouped boxplot it creates the box outlines but as separate grouped items and changes the colors in the existing boxes as in this image:

enter image description here

I'm not sure if the grouping attribute can be forced somehow. TBH I'm not really sure if this approach is viable.

I would love to be able to change the line colors using the following kind of approach as per the example referenced above, but it doesn't seem to work with pre-defined quantiles:

built_fig <- plotly_build(built_fig)

lapply(1:length(stats_df$site_status),
       function(i){
         nm = stats_df$site_status[i]
         cr = ifelse(nm == "Good",
                     "#66FF66", "black")
         built_fig$x$data[[i]]$line$color <<- cr  # change graph by age
       }
)

Any suggestions greatly appreciated.


Solution

  • Here is one option which uses four traces, i.e. one trace for each combo of site type and status, and the offsetgroup= attribute to create your desired result without the need of manipulating the plotly object.

    library(plotly)
    
    plot_ly(
      data = stats_df |> head(0),
      lowerfence = ~lower_fence,
      q1 = ~q1,
      median = ~median,
      q3 = ~q3,
      upperfence = ~upper_fence,
      x = ~site_name,
      offsetgroup = ~site_type,
      color = ~site_type, # boxes
      colors = c("blue", "red"),
      type = "box"
    ) |>
      plotly::add_trace(
        data = stats_df %>% filter(site_status == "Bad", site_type == "A"),
        line = list(color = "green"),
        showlegend = FALSE,
        legendgroup = "A"
      ) |>
      plotly::add_trace(
        data = stats_df %>% filter(site_status == "Bad", site_type == "B"),
        line = list(color = "green"),
        showlegend = FALSE,
        legendgroup = "B"
      ) |>
      plotly::add_trace(
        data = stats_df %>% filter(site_status != "Bad", site_type == "A"),
        line = list(color = "black"),
        legendgroup = "A"
      ) |>
      plotly::add_trace(
        data = stats_df %>% filter(site_status != "Bad", site_type == "B"),
        line = list(color = "black"),
        legendgroup = "B"
      ) |>
      layout(boxmode = "group")