rdistributionsamplegroupecdf

R: My calculated cumulative sample distribution probabilities don't reach 1.0


I want to calculate the cumulative probabilities for my sample for each factor level and save them to a data frame. However, the calculated probabilites don't reach 1.0 and stop e.g. at 0.7 which cannot be true. Somehow it always reaches 1.0 only for one group.

Here is a reproducible example:

library(datasets)

ecdf_fun <- ecdf(iris$Sepal.Width)

dset <- iris %>% group_by(Species) %>%
  reframe(ecdval = ecdf_fun(Sepal.Width))

Which delivers:

    Species    ecdval
1   setosa     1.000000000
2   setosa     0.993333333
...
51  versicolor 0.833333333
52  versicolor 0.753333333
...
101 virginica  0.960000000
102 virginica  0.960000000

ADD-ON: Ideally, I would like to retrieve the cumulative probabilites in combination with their respective x values (Sepal.Width).

    Species    ecdval       Sepal.Width
1   setosa     1.000000000  0.6
2   setosa     0.993333333  ...
...
51  versicolor 0.833333333  1.8
52  versicolor 0.753333333  ...
...
101 virginica  0.960000000  2.5
102 virginica  0.960000000  ...

Solution

  • As Andrew Gustar says the ecdf() needs to be grouped. Then use mutate to keep the original data along with the cdf?

    dset <- iris %>% group_by(Species) %>%
      mutate(ecdval = ecdf(Sepal.Width)(Sepal.Width))
    
    ggplot(dset, aes(Sepal.Width, ecdval, col=Species)) + geom_point() + geom_line()
    

    enter image description here