I am trying to find the cumulative distribution function of a particular type of order statistics; The progressively censored uniform order statistics. I have produced the following R
code:
#The CDF for rth order progressively censored uniform order statistics
n <- 30 #Total number of experimental units
m <- 15 #Desired numbers of failure
R <- c(rep(0, m - 1), n - m) #Progressive censoring scheme
order <- m #Order of the censored order statistics (Here, maximum order)
gam <- NA
Cr <- NA
for(i in 1 : order)
{
gam[i] <- m - i + 1 + sum(R[i : m])
}
for(i in 1 : order)
{
Cr[i] <- prod(gam[1 : i])
}
air <- array(dim = c(order, order))
for(i in 1 : order)
{
for (j in 1 : order) {
if(i != j)
{
air[i, j] <- 1/(gam[j] - gam[i])
}
}
}
A <- NA
for(i in 1 : order)
{
A[i] = prod(na.omit(air[i,]))
}
#CDF of progressively censored uniform order statistics
progU_CDF <- function(u)
{
CDF = NA
for(i in 1 : length(u))
{
CDF[i] <- 1 - (Cr[order] * sum((A/gam) * ((1 - u[i])^(gam))))
}
return(CDF)
}
Now, progU_CDF(0)
should give 0 and progU_CDF(1)
should give 1. Although, in this case, progU_CDF(0)
is producing number very close to 0, not exactly 0. Mathematically, the idea is that when u = 0
, Cr[order] * sum(A/gam) = 1
.
Moreover, when I'm plotting the CDF, the shape is desirable, i.e. monotonically non-decreasing.
plot(seq(0, 1, 0.01), progU_CDF(seq(0, 1, 0.01)), type = "l")
Order Statistics CDF plot n = 30, m = 15
However, things start to go crazy when I take n = 50
and m = 25
. Nothing is changed, but Cr[order] * sum(A/gam)
is nowhere close to 1 when u = 0
. And the CDF plot looks like this:
Order Statistics CDF plot n = 50, m = 25
I suspect this is caused due to the arithmatic operation on extremely large numbers. But I'm unable to trace it.
What's more confusing is that Cr[order] * sum(A/gam)
and sum(Cr[order] * A/gam)
are producing two different numbers, which are counterintuitive, since Cr[order]
is a constant.
My question is, why is it working for n = 30, m = 15
but not for n = 50, m = 25
? Is there any way to deal with such large numbers so that Cr[order] * sum(A/gam)
is always close to 1 whenever u = 0
, no matter the value of n
and m
?
A
results in a vector of very small numbers of alternating sign. The errors compound so quickly that performing the calculations in log scale still results in large errors. Multiple precision with Rmpfr
seems to be the way to go.
Vectorized with Rmpfr
:
library(Rmpfr)
fprogU_CDF <- function(n, m, precBits = 128) {
R <- c(numeric(m - 1), n - m) #Progressive censoring scheme
gam <- mpfr(n:(n - m + 1), precBits)
Cr <- prod(gam)
air <- -1/outer(gam, gam, "-")
diag(air) <- 1
A <- apply(air, 1, prod)
#CDF of progressively censored uniform order statistics
function(u) {
as.numeric(1 - Cr*colSums(A/gam*outer(gam, 1 - u, \(x, y) y^x)))
}
}
Testing:
progU_CDF1 <- fprogU_CDF(30, 15)
progU_CDF1(0)
#> [1] 2.15461e-27
progU_CDF1(1)
#> [1] 1
progU_CDF2 <- fprogU_CDF(50, 25)
progU_CDF2(0)
#> [1] 2.093805e-18
progU_CDF2(1)
#> [1] 1
progU_CDF3 <- fprogU_CDF(20, 10)
progU_CDF3(0)
#> [1] -2.092626e-32
progU_CDF3(1)
#> [1] 1
progU_CDF4 <- fprogU_CDF(200, 40, 40, 256)
progU_CDF4(0)
#> [1] 1.468873e-24
progU_CDF4(1)
#> [1] 1
curve(progU_CDF1(x), 0, 1, col = "blue", ylab = "CDF", xlab = "u")
curve(progU_CDF2(x), 0, 1, col = "orange", add = TRUE)
curve(progU_CDF3(x), 0, 1, col = "purple", add = TRUE)
curve(progU_CDF4(x), 0, 1, col = "red", add = TRUE)
A more condensed version that seems to have better precision.
fprogU_CDF <- function(n, m, precBits = 128) {
d <- mpfr((m - 1):0, precBits)
g <- mpfr(n:(n - m + 1), precBits)
C <- factorial(mpfr(n, precBits))/factorial(mpfr(n - m, precBits))/
factorial(d)/factorial(rev(d))*rev(rep(c(1, -1), length.out = m))/g
#CDF of progressively censored uniform order statistics
function(u) as.numeric(1 - colSums(C*outer(g, 1 - u, \(x, y) y^x)))
}
Testing:
progU_CDF1 <- fprogU_CDF(30, 15)
progU_CDF1(0)
#> [1] 0
progU_CDF1(1)
#> [1] 1
progU_CDF2 <- fprogU_CDF(50, 25)
progU_CDF2(0)
#> [1] 0
progU_CDF2(1)
#> [1] 1
progU_CDF3 <- fprogU_CDF(20, 10)
progU_CDF3(0)
#> [1] 0
progU_CDF3(1)
#> [1] 1
progU_CDF4 <- fprogU_CDF(200, 40, 256)
progU_CDF4(0)
#> [1] 3.845097e-25
progU_CDF4(1)
#> [1] 1
curve(progU_CDF1(x), 0, 1, col = "blue", ylab = "CDF", xlab = "u")
curve(progU_CDF2(x), 0, 1, col = "orange", add = TRUE)
curve(progU_CDF3(x), 0, 1, col = "purple", add = TRUE)
curve(progU_CDF4(x), 0, 1, col = "red", add = TRUE)