colleagues.
I am trying to build a distribution diagram that will satisfy the following conditions:
Problem: In order to visualize everything that is 'more than limit' I have to make the x-axis discrete, otherwise the last bar can be literally endless, including all values from limit to maximum. But in order to put vertical intercept at specified point, x-axis should be continuous.
Any ideas how can I workaround it?
Code: Here is code example:
data <- data.frame(value = runif(1000, min = 0, max = 1000))
data$value <- round(data$value, digits = 0)
median_elapsed <- median(data$value)
bin_breaks <- c(seq(0,
median_elapsed,
length.out = 11),
Inf)
bin_labels <- c(seq(0,
median_elapsed - (median_elapsed / 10),
length.out = 10),
paste0("> ", median_elapsed))
data$bins <- cut(data$value,
breaks = bin_breaks,
labels = bin_labels,
include.lowest = TRUE,
right = FALSE)
get_home_data_percent <- data %>%
group_by(bins) %>%
summarize(count = n()) %>%
mutate(percentage = count / sum(count) * 100)
ggplot(get_home_data_percent, aes(x = bins, y = percentage)) +
geom_bar(stat = "identity", just = 0) +
scale_x_discrete(drop = FALSE) +
labs(x = "Elapsed Time",
y = "Percentage",
title = "Histogram of Elapsed Time") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Illustration: So here I have almost everything needed, but not the vertical line with median value, as the x-axis is discrete.
It's not clear to me that you couldn't put bins on a continuous scale. (Although maybe we'd want some custom axis labeling here for the last bin to clarify its meaning...)
Here I calculate median for the geom_vline, and separately set a value for the top category that includes all values over that value.
med = median(data$value)
upper = 600
bin_size = upper / 11
library(dplyr); library(ggplot2)
data |>
mutate(bin = if_else(value < upper,
value %/% bin_size * bin_size, upper)) |>
summarize(n = n(), .by = bin) |>
ggplot(aes(bin + bin_size/2, n / sum(n))) +
geom_col() +
geom_vline(xintercept = med) +
scale_x_continuous(breaks = scales::breaks_width(bin_size),
labels = scales::number_format(accuracy = 0.1))