I have a data frame, from which I created a "stats" data frame (mean, min, max, median, Q1, etc), then I made facet_wrap histograms. Now I want to add vertical lines in each histogram based on the data columns from the "stats" data frame. Adding a single line at x=0 works fine, but when adding min, mean or Q1, many lines are displayed instead of a single line representing the stat value. Can anyone tell me where is the problem in my code:
# Generate data
set.seed(123)
# Create a data frame with 10 columns and 100 rows
data <- data.frame(matrix(runif(1000, -3, 1), ncol = 10))
# Create a bimodal distribution in each column
for (i in 1:10) {
data[,i] <- ifelse(runif(100) < 0.5, rnorm(100, -1, 0.5), rnorm(100, 0, 0.5))
}
# Print the first few rows of the data frame
head(data)
# Data Stats
data_stats <- data.frame(
Mean = apply(data, 2, mean, na.rm = TRUE),
SD = apply(data, 2, sd, na.rm = TRUE),
Max = apply(data, 2, max, na.rm = TRUE),
Median = apply(data, 2, median, na.rm=TRUE),
Q1 = apply(data, 2, quantile, probs = 0.25, na.rm = TRUE),
Min = apply(data, 2, min, na.rm =TRUE)
)
# --- Long data frame
data_long <- gather(data)
#---- Plot try 1
ggplot(data_long, aes(x = value)) +
geom_histogram(bins = 20) +
geom_vline(xintercept = 0, color = "red") +
geom_vline(xintercept = data_stats$Mean, linetype = "solid", color = "green") +
geom_vline(xintercept = data_stats$Q1, linetype = "solid", color = "blue") +
geom_vline(xintercept = data_stats$Min, linetype = "solid", color = "black") +
facet_wrap(~key, scales = 'free')
#---- Plot try 2
ggplot(data_long, aes(x = value)) +
geom_histogram(bins = 20) +
geom_vline(xintercept = 0, color = "red") +
geom_vline(data = data_stats, aes(xintercept = Mean), linetype = "solid", color = "green") +
geom_vline(data = data_stats, aes(xintercept = Q1), linetype = "solid", color = "blue") +
geom_vline(data = data_stats, aes(xintercept = Min), linetype = "solid", color = "black") +
facet_wrap(~key, scales = 'free')
Your data_stats
dataframe has 10 rows and no "key" column, so you just get all ten values from each column in every facet. To fix it, add the rownames as a column and pivot to long format. You will save a few lines of plotting code by mapping color to the name
column of the pivoted data frame, thus only needing a single geom_vline
library(tidyverse)
data_stats %>%
mutate(zero = 0) %>%
rownames_to_column(var = "key") %>%
pivot_longer(-key) %>%
filter(name %in% c("Mean", "Q1", "Min", "zero")) %>%
ggplot(aes(x = value)) +
geom_histogram(bins = 20, data = tidyr::gather(data)) +
geom_vline(aes(xintercept = value, color = name)) +
scale_color_manual(NULL, values = c("green", "blue", "black", "red")) +
facet_wrap(~key, scales = 'free')