rggplot2histogramline

Is there a way to add vertical lines to histograms in R, where the line positions come from a separate data frame?


I have a data frame, from which I created a "stats" data frame (mean, min, max, median, Q1, etc), then I made facet_wrap histograms. Now I want to add vertical lines in each histogram based on the data columns from the "stats" data frame. Adding a single line at x=0 works fine, but when adding min, mean or Q1, many lines are displayed instead of a single line representing the stat value. Can anyone tell me where is the problem in my code:

# Generate data 
set.seed(123)

# Create a data frame with 10 columns and 100 rows
data <- data.frame(matrix(runif(1000, -3, 1), ncol = 10))

# Create a bimodal distribution in each column
for (i in 1:10) {
  data[,i] <- ifelse(runif(100) < 0.5, rnorm(100, -1, 0.5), rnorm(100, 0, 0.5))
}

# Print the first few rows of the data frame
head(data)


# Data Stats
data_stats <- data.frame(
  Mean = apply(data, 2, mean, na.rm = TRUE),
  SD = apply(data, 2, sd, na.rm = TRUE),
  Max = apply(data, 2, max, na.rm = TRUE),
  Median = apply(data, 2, median, na.rm=TRUE),
  Q1 = apply(data, 2, quantile, probs = 0.25, na.rm = TRUE),
  Min = apply(data, 2, min, na.rm =TRUE)
)


# --- Long data frame
data_long <- gather(data)

#---- Plot try 1
ggplot(data_long, aes(x = value)) +
  geom_histogram(bins = 20) +
  geom_vline(xintercept = 0, color = "red") +
  geom_vline(xintercept = data_stats$Mean, linetype = "solid", color = "green") +
  geom_vline(xintercept = data_stats$Q1, linetype = "solid", color = "blue") +  
  geom_vline(xintercept = data_stats$Min, linetype = "solid", color = "black") +
  facet_wrap(~key, scales = 'free')

#---- Plot try 2
ggplot(data_long, aes(x = value)) +
  geom_histogram(bins = 20) +
  geom_vline(xintercept = 0, color = "red") +
  geom_vline(data = data_stats, aes(xintercept = Mean), linetype = "solid", color = "green") +
  geom_vline(data = data_stats, aes(xintercept = Q1), linetype = "solid", color = "blue") +  
  geom_vline(data = data_stats, aes(xintercept = Min), linetype = "solid", color = "black") +
  facet_wrap(~key, scales = 'free')

Solution

  • Your data_stats dataframe has 10 rows and no "key" column, so you just get all ten values from each column in every facet. To fix it, add the rownames as a column and pivot to long format. You will save a few lines of plotting code by mapping color to the name column of the pivoted data frame, thus only needing a single geom_vline

    library(tidyverse)
    
    data_stats %>%
      mutate(zero = 0) %>%
      rownames_to_column(var = "key") %>%
      pivot_longer(-key) %>%
      filter(name %in% c("Mean", "Q1", "Min", "zero")) %>%
      ggplot(aes(x = value)) +
      geom_histogram(bins = 20, data = tidyr::gather(data)) +
      geom_vline(aes(xintercept = value, color = name)) +
      scale_color_manual(NULL, values = c("green", "blue", "black", "red")) +
      facet_wrap(~key, scales = 'free') 
    

    enter image description here