I want to calculate the cumulative probabilities for my sample for each factor level and save them to a data frame. However, the calculated probabilites don't reach 1.0 and stop e.g. at 0.7 which cannot be true. Somehow it always reaches 1.0 only for one group.
Here is a reproducible example:
library(datasets)
ecdf_fun <- ecdf(iris$Sepal.Width)
dset <- iris %>% group_by(Species) %>%
reframe(ecdval = ecdf_fun(Sepal.Width))
Which delivers:
Species ecdval
1 setosa 1.000000000
2 setosa 0.993333333
...
51 versicolor 0.833333333
52 versicolor 0.753333333
...
101 virginica 0.960000000
102 virginica 0.960000000
ADD-ON: Ideally, I would like to retrieve the cumulative probabilites in combination with their respective x values (Sepal.Width).
Species ecdval Sepal.Width
1 setosa 1.000000000 0.6
2 setosa 0.993333333 ...
...
51 versicolor 0.833333333 1.8
52 versicolor 0.753333333 ...
...
101 virginica 0.960000000 2.5
102 virginica 0.960000000 ...
As Andrew Gustar says the ecdf()
needs to be grouped. Then use mutate to keep the original data along with the cdf?
dset <- iris %>% group_by(Species) %>%
mutate(ecdval = ecdf(Sepal.Width)(Sepal.Width))
ggplot(dset, aes(Sepal.Width, ecdval, col=Species)) + geom_point() + geom_line()