So I am trying to find a way to calculate the ecdf for specific values of my data. I have a data frame that looks something like this:
Name Type Value
B pace_20min_ms 6M 2S
A pace_20min_ms 5M 32S
So what I want to do is: Find the value of the ecdf for example for A and say: A is faster than 65% of people, who have done the test. But I am struggling with the "Value", as this is in this lubridate format Minutes and Seconds.
What I figured out so far is how to calculate specific quantiles:
quantile(dat$Value, probs = c(0.1, 0.25, 0.5, 0.75, 0.9), type = 1)
[1] "3M 57S" "4M 25S" "4M 56S" "5M 32S" "6M 2S"
Maybe it's not that hard to calculate it the other way around, but I don't know how to do it. Thank you so much!
You could convert to seconds and back again, sth like:
> r <- colSums(sapply(strsplit(gsub('[MS]', '', x), ' '), as.integer)*c(60, 1)) |>
+ quantile(probs=c(0.1, 0.25, 0.5, 0.75, 0.9), type=1)
> sprintf('%sM %sS', r %/% 60, r %% 60) |> setNames(names(r))
10% 25% 50% 75% 90%
"0M 20S" "1M 23S" "3M 24S" "5M 5S" "6M 31S"
Not sure how your data is exactly formatted, but you get the idea.
Data:
> n <- 100
> set.seed(42)
> x <- mapply(\(x, y) sprintf('%sM %sS', x, y),
+ sample(0:7, n, replace=TRUE),
+ sample(0:34, n, replace=TRUE))