I am creating a grouped boxplot in plotly using predefined quantiles. I want to color the lines around the boxes based on a separate variable. I can't seem to do this directly in my plotly call. There is a nice solution here which involves changing the line colours as a post-processing step using plotly_build
. The example in that link works well but the structure of the data is different when using pre-defined quantiles, and I can't seem to access the data as in that example. Perhaps there is a way but I can't figure it out.
My attempted solution is the following, which involves adding a new trace, but with transparent fill as follows:
library(plotly)
library(dplyr)
# CREATE DUMMY DATA
set.seed(123) # Set seed for reproducibility
site_name <- rep(paste0("site_", 1:5), each = 40) # Create the site_name column with 5 different site names, each with 20 rows
site_type <- rep(c("A", "B"), each = 20, times = 5) # Create the site_type column with 10 'A's and 10 'B's for each site
value <- runif(100, min = 0, max = 200) # Create the value column with random numbers (between 0 and 100).
# Combine into a data frame
df <- data.frame(site_name, site_type, value)
# Generate site_status randomly for each combination of site_name and site_type
unique_combinations <- unique(df[c("site_name", "site_type")])
unique_combinations$site_status <- sample(c("Good", "Bad"), nrow(unique_combinations), replace = TRUE)
# Merge site_status back to the original df
df <- df %>%
left_join(unique_combinations, by = c("site_name", "site_type"))
# Display the first few rows of the dataset
head(df, 20)
# MAKE SUMMARY DATA
# Group by site_name and site_type, then calculate summary statistics
stats_df <- df %>%
group_by(site_name, site_type) %>%
summarise(
lower_fence = quantile(value, probs = c(0.05), type = 5, na.rm = TRUE),
q1 = quantile(value, probs = c(0.25), type = 5, na.rm = TRUE),
median = quantile(value, probs = c(0.5), type = 5, na.rm = TRUE),
mean = mean(value, na.rm = TRUE),
q3 = quantile(value, probs = c(0.75), type = 5, na.rm = TRUE),
upper_fence = quantile(value, probs = c(0.95), type = 5, na.rm = TRUE),
sd = sd(value, na.rm = TRUE),
site_status = unique(site_status),
.groups = 'drop'
)
# PLOTTING CODE
# make box plot
fig <- plot_ly(data = stats_df,
x = ~site_name,
color = ~site_type, # boxes
colors = c("blue","red"),
type = "box",
lowerfence = ~lower_fence,
q1 = ~q1,
median = ~median,
q3 = ~q3,
upperfence = ~upper_fence) %>%
layout(boxmode = "group", boxgap = 1/5)
# Filter out the boxes to be drawn with green boxes around them
bad_data <- stats_df %>% filter(site_status == "Bad")
# Add green boxes
fig <- fig %>% plotly::add_trace(
x = factor(bad_data$site_name),
color = factor(bad_data$site_type),
colors = c("blue","red"),
type = "box",
lowerfence = bad_data$lower_fence,
q1 = bad_data$q1,
median = bad_data$median,
q3 = bad_data$q3,
upperfence = bad_data$upper_fence,
line = list(color = "green"),
fillcolor = "rgba(255,0,0,0.0)", # Red with transparency
boxmean = FALSE, # Avoid adding box means
showlegend = TRUE,
inherit = FALSE
)
# Show the figure
fig
This works like a charm when the data aren't grouped, but unfortunately when using a grouped boxplot it creates the box outlines but as separate grouped items and changes the colors in the existing boxes as in this image:
I'm not sure if the grouping attribute can be forced somehow. TBH I'm not really sure if this approach is viable.
I would love to be able to change the line colors using the following kind of approach as per the example referenced above, but it doesn't seem to work with pre-defined quantiles:
built_fig <- plotly_build(built_fig)
lapply(1:length(stats_df$site_status),
function(i){
nm = stats_df$site_status[i]
cr = ifelse(nm == "Good",
"#66FF66", "black")
built_fig$x$data[[i]]$line$color <<- cr # change graph by age
}
)
Any suggestions greatly appreciated.
Here is one option which uses four traces, i.e. one trace for each combo of site type and status, and the offsetgroup=
attribute to create your desired result without the need of manipulating the plotly object.
library(plotly)
plot_ly(
data = stats_df |> head(0),
lowerfence = ~lower_fence,
q1 = ~q1,
median = ~median,
q3 = ~q3,
upperfence = ~upper_fence,
x = ~site_name,
offsetgroup = ~site_type,
color = ~site_type, # boxes
colors = c("blue", "red"),
type = "box"
) |>
plotly::add_trace(
data = stats_df %>% filter(site_status == "Bad", site_type == "A"),
line = list(color = "green"),
showlegend = FALSE,
legendgroup = "A"
) |>
plotly::add_trace(
data = stats_df %>% filter(site_status == "Bad", site_type == "B"),
line = list(color = "green"),
showlegend = FALSE,
legendgroup = "B"
) |>
plotly::add_trace(
data = stats_df %>% filter(site_status != "Bad", site_type == "A"),
line = list(color = "black"),
legendgroup = "A"
) |>
plotly::add_trace(
data = stats_df %>% filter(site_status != "Bad", site_type == "B"),
line = list(color = "black"),
legendgroup = "B"
) |>
layout(boxmode = "group")