rggplot2plothistogramaesthetics

how to plot side by side histogram for two different groups


I want to plot histogram for two different group in R. I tried using ggplot2 package and base R hist function. I want to plot the groups value side by side and not a stacked histogram.

I am able to do it using basic hist function but since my values are extreme for one group, the plot is cut from top. Is there any way to do relative count for the y axis to include the plot for both the groups? Data used:

dput(corr_7269[1:5,1:3])
structure(list(rsid = c("ID8138863", "ID7364185", "ID5765371", 
"ID131903", "ID12106592"), corr = c(0.896555463962723, 0.903460756274872, 
0.877977378228679, 0.885319129428826, 0.871646608498413), models = c("n+1", 
"n+1", "n+1", "n+1", "n+1")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

dput(corr_n[1:5,1:3])
structure(list(rsid = c("ID6007530", "ID7364174", "ID6007567", 
"ID112187135", "ID144824037"), corr = c(0.96655907381546, 0.9202563923255, 
0.937166086757865, 0.906450119910952, 0.950030176517754), models = c("n_1", 
"n_1", "n_1", "n_1", "n_1")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

What I tried: using basic hist function in R:

hist(corr_n$corr)
hist(corr_7269$corr,add=T)

enter image description hereDoing this way: I am not able to differentiate between two groups, second one dataset has 7000 values and other has only 128, so therefore it not able to display everything. Can we do relative count instead of count?

using ggplot function:

combinedregioncorr <- rbind(corr_7269, corr_n)
library(ggplot2)
g <- ggplot(combinedregioncorr, aes(corr, fill = models)) + geom_histogram(alpha = 0.2)+scale_y_log10()
g+labs(x="rho values different groups",y="count")

Using this able to differentiate between two groups but its coming as stacked histogram, I am more interested in seeing side by side values for each group. More like previous plot using hist function. enter image description here

Is there any way to do this? or any modification that can be included in the code to do it. Thank you.


Solution

  • as you can see below, the default behavior of geom_histogram is to stack the bins:

    ?geom_histogram
    geom_histogram(
      mapping = NULL,
      data = NULL,
      stat = "bin",
      position = "stack",
      ...,
      binwidth = NULL,
      bins = NULL,
      na.rm = FALSE,
      orientation = NA,
      show.legend = NA,
      inherit.aes = TRUE
    )
    

    To obtain the behavior you are looking for you must set position = "identity"

    From the code you posted:

    g <- ggplot(combinedregioncorr, aes(corr, fill = models)) + geom_histogram(alpha = 0.2, position = "identity") + scale_y_log10()
    

    You can look here for a complete example: The R Graph Gallery