rggplot2histogram

Adding percentage instead of observed counts on y axis on R figure


I have the following script to create a figure:

histogram_endo_cases_controls <- ggplot(prs_data, aes(x = normalized, fill = as.factor(endometriosis))) +
  geom_histogram(position = "identity", alpha = 0.5, binwidth = 0.2, color = "black") +
  scale_fill_manual(values = c("blue", "yellow"), 
                    labels = c("Controls", "Cases"),
                    name = "Group") +
  labs(title = "Histogram of Polygenic Risk Scores",
       x = "PRS",
       y = "Frequency") +
  theme_minimal()

The data I am plotting are scores for two groups, cases and controls. Cases are coded as 1, and controls are coded as 0.

I would like to plot the percentage of individuals experiencing the same score both in cases and controls, because I have a big difference in the numbers (significantly more controls). So the plot would look like the attached example (y axis having a percentage so the histograms have the same height).

enter image description here

Reproducible example: For both cases and controls:

data_all <- data.frame(
  x = c(0.00, -0.54, 1.35, 1.23, -2.34),
  y = c(304000, 100500, 50300, 55400, 12)
)

Just cases:

data_cases <- data.frame(
  x = c(0.00, -0.54, 1.35, 1.23, -2.34),
  y = c(4000, 500, 300, 400, 2)
)

Just controls:

data_controls <- data.frame(
  x = c(0.00, -0.54, 1.35, 1.23, -2.34),
  y = c(300000, 100000, 50000, 55000, 10)
)

So as you can see, this is the number of individuals rather than the percentage of individuals. So when I plot them separately, the height of the cases is really low and the difference between their distribution cannot be seen.


Solution

  • Instead of ggplot2::geom_histogram(), use ggplot2::geom_density(alpha = .5).