I am creating a grouped boxplot using plotly. I have to specify the quanitles because I have a specific way of calculating them. I also want to add the outliers to the plot as with standard behavior for a boxplot where plotly calculates the quantiles internally. I am currently trying to add them as a separate trace, but they end up in the middle of the grouped boxes. Maybe there is a way of adding them along with the plotly call that adds the grouped boxes, but if there is I cant't see it. How can I make it so that the outliers line up with the boxes? Reprex below.
set.seed(123) # Set seed for reproducibility
# Create the site_name column with 5 different site names, each with 20 rows
site_name <- rep(paste0("site_", 1:5), each = 40)
# Create the site_type column with 10 'A's and 10 'B's for each site
site_type <- rep(c("A", "B"), each = 20, times = 5)
# Create the value column with random numbers
value <- runif(100, min = 0, max = 200) # Random numbers between 0 and 100
# Combine into a data frame
df <- data.frame(site_name, site_type, value)
# Display the first few rows of the dataset
head(df, 20)
# Group by site_name and site_type, then calculate summary statistics
stats_df <- df %>%
group_by(site_name, site_type) %>%
summarise(
lower_fence = quantile(value, probs = c(0.05), type = 5, na.rm = TRUE),
q1 = quantile(value, probs = c(0.25), type = 5, na.rm = TRUE),
median = quantile(value, probs = c(0.5), type = 5, na.rm = TRUE),
mean = mean(value, na.rm = TRUE),
q3 = quantile(value, probs = c(0.75), type = 5, na.rm = TRUE),
upper_fence = quantile(value, probs = c(0.95), type = 5, na.rm = TRUE),
sd = sd(value, na.rm = TRUE),
.groups = 'drop'
)
# Create the grouped bar plot
fig <- plot_ly(
data = stats_df,
x = ~factor(site_name),
color = ~factor(site_type),
colors = c("blue","red"),
type = "box",
source = "boxes",
lowerfence = ~lower_fence,
q1 = ~q1,
median = ~median,
q3 = ~q3,
upperfence = ~upper_fence,
showlegend = show_legend
) %>%
layout(boxmode = "group")
# Extract outliers
filtered_df<- df %>%
left_join(stats_df, by = c("site_name", "site_type")) %>%
filter(value < lower_fence | value > upper_fence)
# Add the outlier points
fig <- fig %>%
add_trace(
data = filtered_df,
x = ~factor(site_name),
y = ~value,
color = ~factor(site_type),
colors = landuse_colors,
type = "scatter",
mode = "markers",
marker = list(size = 5, opacity = 0.6), # Customize marker appearance
showlegend = FALSE, # Hide legend for scatter points if desired
inherit = FALSE
)
# Show the figure
fig
You were so close! You need to call the argument scattermode = 'group'
in the call for layout()
.
However, because the version of Plotly used by the R library by default is so old, it won't work without updating the Plotly.Js dependency that the R library relies on.
The points were a bit off center after updating, boxgap
was used to align the markers. The value of 1/5
was used because there are 6 groups leading to 5 between-groups' space.
You can use arguments like
boxgap
andboxgroupgap
to adjust the appearance, but I don't believe thatscattergap
orscattergroupgap
were added to Plotly with addition ofscattermode
.
I used a UDF to update the Plotly.Js dependency. (There are multiple arguments in the Plotly library that don't work in R without this update, so this function could be useful for a variety of reasons...)
fixer <- function(plt) {
# changes to dependency so that all code works
plt$dependencies[[5]]$src$file = NULL
plt$dependencies[[5]]$src$href = "https://cdn.plot.ly"
plt$dependencies[[5]]$script = "plotly-2.33.0.min.js"
plt$dependencies[[5]]$local = FALSE
plt$dependencies[[5]]$package = NULL
plt
}
The boxplot, markers, layout and the updated dependency
If you comment out, hide, or remove
fixer()
, you'll see that the call forscattermode
is ignored.
plot_ly(data = stats_df, x = ~site_name, color = ~site_type, # boxes
colors = c("blue","red"), type = "box",
lowerfence = ~lower_fence, q1 = ~q1, median = ~median,
q3 = ~q3, upperfence = ~upper_fence) %>%
add_markers(data = filtered_df, x = ~site_name, y = ~value, # markers
color = ~site_type, showlegend = F,
marker = list(size = 5, opacity = 0.6)) %>%
layout(boxmode = "group", scattermode = "group", boxgap = 1/5) %>% # align
fixer() # update Plotly dependency
The boxplot with aligned outliers.