rlubridatequantileecdf

Calculating ECDF in R for time period (lubridate)


So I am trying to find a way to calculate the ecdf for specific values of my data. I have a data frame that looks something like this:

Name      Type             Value 
B         pace_20min_ms    6M 2S
A         pace_20min_ms    5M 32S

So what I want to do is: Find the value of the ecdf for example for A and say: A is faster than 65% of people, who have done the test. But I am struggling with the "Value", as this is in this lubridate format Minutes and Seconds.

What I figured out so far is how to calculate specific quantiles:

quantile(dat$Value, probs = c(0.1, 0.25, 0.5, 0.75, 0.9), type = 1)
[1] "3M 57S" "4M 25S" "4M 56S" "5M 32S" "6M 2S"

Maybe it's not that hard to calculate it the other way around, but I don't know how to do it. Thank you so much!


Solution

  • You could convert to seconds and back again, sth like:

    > r <- colSums(sapply(strsplit(gsub('[MS]', '', x), ' '), as.integer)*c(60, 1)) |> 
    +   quantile(probs=c(0.1, 0.25, 0.5, 0.75, 0.9), type=1)
    > sprintf('%sM %sS', r %/% 60, r %% 60) |> setNames(names(r))
         10%      25%      50%      75%      90% 
    "0M 20S" "1M 23S" "3M 24S"  "5M 5S" "6M 31S" 
    

    Not sure how your data is exactly formatted, but you get the idea.


    Data:

    > n <- 100
    > set.seed(42)
    > x <- mapply(\(x, y) sprintf('%sM %sS', x, y), 
    +             sample(0:7, n, replace=TRUE), 
    +             sample(0:34, n, replace=TRUE))