rggplot2ggridgesridgeline-plot

Add points to geom_density_ridges for groups with small number of observations


I am loving using geom_density_ridges(), with individual points also included for each group. However, some groups have small sample sizes (e.g. n=1 or 2) precluding the generation of the density ridges. For these groups, I'd like to be able to plot the locations of the existing observations - even though no probability density function will be shown.

In this example, I'd like to be able to plot the 2 data points for May on the appropriate line.

    library(tidyverse)
    library(ggridges)
    
    data("lincoln_weather")
    
    #pull weather from all months that are NOT May
    lincoln_weather_nomay<-lincoln_weather[which(lincoln_weather$Month!="May"),]
    
    #pull weather just from May
    lincoln_weather_may<-lincoln_weather[which(lincoln_weather$Month=="May"),]
    
    #recombine, keeping only the first two rows for the May dataset
    new_weather<-rbind(lincoln_weather_nomay,lincoln_weather_may[c(1:2),])
    
    ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
      geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) + 
      labs(x="Average temperature (F)",y='')+ 
      guides(fill=FALSE,color=FALSE)

geom_density_ridges plot, missing observations for low-sample-size group (May)

How can I add the points for the May observations to the appropriate location (i.e. the May slot) and at the appropriate location along the x-axis?


Solution

  • Simply add a separate geom_point() call to the function, in which you subset the data to include only observations for the previously-unplotted categories. You can apply any of the usual customizations to either 'match' the points plotted for the other categories, or to make these points 'stand out'.

    ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
      geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) + 
      geom_point(data=subset(new_weather, Month %in% c("May")),
                 aes(),shape=13)+
      labs(x="Average temperature (F)",y='')+ 
      guides(fill=FALSE,color=FALSE)
    

    Ridgeline plot with points added for categories deficient in observations necessary to produce density ridge