I have a dataset that is as follows (10,000+ Rows):
P_ID | SNUM | RNUM | X |
---|---|---|---|
ID_233 | 10 | 2 | 40.31 |
ID_233 | 10 | 3 | 23.21 |
ID_234 | 12 | 5 | 11.00 |
ID_234 | 12 | 6 | 0.31 |
ID_234 | 13 | 1 | 0.00 |
ID_235 | 10 | 2 | 66.23 |
From this dataset, I want to fit each distinct P_ID
to a Gamma distribution (ignoring the testing of how well the sampled data fits the distribution)
Using the fitdistrplus
package, I can achieve this by extracting the X
for an individual P_ID
into a vector and then run it through fw <- fitdist(data,"gamma")
and then extract the shape
and rate
descriptive variables out, but this is all very manual.
I would like to find a method using tidyverse to go from the data frame above to:
P_ID | Distrib | G_Shape | G_Rate |
---|---|---|---|
ID_233 | Gamma | 1.21557116 | 0.09206639 |
ID_234 | Gamma | 3.23234542 | 0.34566432 |
ID_235 | Gamma | 2.34555553 | 0.92344521 |
How would i achieve this with Tidyverse and Pipes and not doing a succession of for loops?
You could apply fitdist
for every individual using group_by
and extract shape
and rate
values out of each model.
library(dplyr)
library(purrr)
library(fitdistrplus)
data %>%
group_by(P_ID) %>%
summarise(model = list(fitdist(X, "gamma"))) %>%
mutate(G_Shape = map_dbl(model, pluck, 'estimate', 'shape'),
G_rate = map_dbl(model, pluck, 'estimate', 'rate')) -> result
result