rggplot2ggh4x

Plotting the theoretical distribution of exponential with a minimum on facets (ggh4x)


Users of a Shiny app can test data sets for Poisson, normality, and exponentiality. I am returning the results of the statistical test they chose. In addition, I thought it would be nice to plot the density from the data along with the theoretical distribution. They could be testing multiple sets of data at once, so I am faceting the plot.

From ggplot add Normal Distribution while using `facet_wrap` I found the really great ggh4x package. However, since this could be industry data, there may be a minimum that is not zero.

The problem is that theodensity(distri="exp") uses dexp which doesn't account for a minimum number, so the theoretical distribution plot doesn't match the data.

How can I tell the stat_theodensity that there is an xmin for each facet, which is the min of the data in the facet? I see that fitdistrplus can use different methods to fit an exponential curve, and that, for example, method="mse" would work. Is there a way to pass this through stat_theodensity?

library(ggh4x)

#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/100)+100
data2 <- rexp(n = 500,rate = 1/250)+500
data <- c(data1,data2)

#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)

#make the data for plotting
plot_dat <- data.frame(ID,data)

#make the graph
p <- ggplot(data = plot_dat, aes(x=data))+
  geom_density()+
  stat_theodensity(distri = "exp")+
  facet_wrap(facets = ~ID,scales = "free")
p

#what the first point of the graphs should be
dexp(x = 100-100,rate = 1/100)
#[1] 0.01
dexp(x = 500-500,rate = 1/250)
#[1] 0.004

********EDIT

OK I am getting closer. The following code works, but only for the second pass through the loop. If I change the numbers around for data1 and data2, it is always only the second one that plots the theoretical distribution.

I did ggplot_build after the first loop and it gives an error in fitdist(), which is code 100. I don't know why it would always fail on the first one but not on the second one, even with the same data.

Any ideas?

#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/250)+500
data2 <- rexp(n = 500,rate = 1/100)+250
data <- c(data1,data2)

#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)

#make the data for plotting
plot_dat <- data.frame(ID,data)

#make the graph
p <- ggplot(data = plot_dat, aes(x=data))+
  geom_density(color="red")

#loop through sets and add facets
for (set in unique(plot_dat$ID)){
  xmin <- min(plot_dat$data[ID == set])
  p<-p+
    stat_theodensity(
      data = ~subset(.x, ID == set),
      aes(x = stage(data - xmin, after_stat = x + xmin)),
      distri = "exp"
    )
}

  #stat_theodensity(distri = "exp")+
p<-p+
  facet_wrap(facets = ~ID,scales = "free")
p

Solution

  • I don't know about the statistics of your problem, but if the issue is subtracting a number before calculating the density and afterwards adding it, you might do that with stage(). I couldn't find a more elegant way than hardcoding these values for each set separately, but I'd be happy to hear about more creative solutions.

    library(ggh4x)
    #> Loading required package: ggplot2
    
    #generate 2 exponential distributions with xmin > 0
    data1 <- rexp(n = 500,rate = 1/100)+100
    data2 <- rexp(n = 500,rate = 1/250)+500
    data <- c(data1,data2)
    
    #generate a code for facets
    ID1 <- c(rep("Set 1",times=500))
    ID2 <- c(rep("Set 2",times=500))
    ID <- c(ID1,ID2)
    
    #make the data for plotting
    plot_dat <- data.frame(ID,data)
    
    #make the graph
    ggplot(data = plot_dat, aes(x=data))+
      geom_density() +
      stat_theodensity(
        data = ~ subset(.x, ID == "Set 1"),
        aes(x = stage(data - 100, after_stat = x + 100)),
        distri = "exp"
      ) +
      stat_theodensity(
        data = ~ subset(.x, ID == "Set 2"),
        aes(x = stage(data - 500, after_stat = x + 500)),
        distri = "exp"
      ) +
      facet_wrap(facets = ~ID,scales = "free")
    

    Created on 2022-09-26 by the reprex package (v2.0.1)

    EDIT

    I think OP's update had a problem with non-standard evaluation. It should work when you use a lapply() loop instead of a for-loop because then xmin is not a global variable that might be mistakingly looked up.

    library(ggh4x)
    #> Loading required package: ggplot2
    library(ggplot2)
    
    #generate 2 exponential distributions with xmin > 0
    data1 <- rexp(n = 500,rate = 1/250)+500
    data2 <- rexp(n = 500,rate = 1/100)+250
    data <- c(data1,data2)
    
    #generate a code for facets
    ID1 <- c(rep("Set 1",times=500))
    ID2 <- c(rep("Set 2",times=500))
    ID <- c(ID1,ID2)
    
    #make the data for plotting
    plot_dat <- data.frame(ID,data)
    
    #make the graph
    p <- ggplot(data = plot_dat, aes(x=data))+
      geom_density(color="red") +
      facet_wrap(facets = ~ ID, scales = "free")
    
    #loop through sets and add facets
    p + lapply(unique(plot_dat$ID), function(i) {
      xmin <- min(plot_dat$data[plot_dat$ID == i])
      stat_theodensity(
        data = ~ subset(.x, ID == i),
        aes(x = stage(data - xmin, after_stat = x + xmin)),
        distri = "exp"
      )
    })
    

    Created on 2022-09-27 by the reprex package (v2.0.1)