rtidyversefitdistrplus

Batch distribution fitting using Tidyverse and fitdistrplus


I have a dataset that is as follows (10,000+ Rows):

P_ID SNUM RNUM X
ID_233 10 2 40.31
ID_233 10 3 23.21
ID_234 12 5 11.00
ID_234 12 6 0.31
ID_234 13 1 0.00
ID_235 10 2 66.23

From this dataset, I want to fit each distinct P_ID to a Gamma distribution (ignoring the testing of how well the sampled data fits the distribution)

Using the fitdistrplus package, I can achieve this by extracting the X for an individual P_ID into a vector and then run it through fw <- fitdist(data,"gamma") and then extract the shape and rate descriptive variables out, but this is all very manual.

I would like to find a method using tidyverse to go from the data frame above to:

P_ID Distrib G_Shape G_Rate
ID_233 Gamma 1.21557116 0.09206639
ID_234 Gamma 3.23234542 0.34566432
ID_235 Gamma 2.34555553 0.92344521

How would i achieve this with Tidyverse and Pipes and not doing a succession of for loops?


Solution

  • You could apply fitdist for every individual using group_by and extract shape and rate values out of each model.

    library(dplyr)
    library(purrr)
    library(fitdistrplus)
    
    data %>%
      group_by(P_ID) %>%
      summarise(model = list(fitdist(X, "gamma"))) %>%
      mutate(G_Shape = map_dbl(model, pluck, 'estimate', 'shape'),
             G_rate =  map_dbl(model, pluck, 'estimate', 'rate')) -> result
    
    result