I have the following problem: given k (=10) discrete independent random variables X_i with n_i ( = 5 to 20) values each.
Problem: compute quantiles of the distribution of the sum X = X_1+...+X_k.
Here X has n=n_1 x n_2 ... n_k distinct values which is too large to list them all together with their probabilities.
I tried several approaches:
(A) Convolution:
each X_j is approximated with Y_j=X_j+Z, where Z is a normal N(0,sigma) variable with small sigma. Then Y_j is a probability mixture of the normal variables N(x_j,sigma), where the x_j runs over all values of X_j, and has a highly oscillatory density.
The density of Y=\sum Y_j is the convolution of the densities of the Y_j.
I need this density at a sizeable number of points and this turns out to be too slow. The issue seems to be the convergence of the convolution integrals slowed down by the oscillatory nature of the densities of the Y_j. When the densities are behaved better (e.g. normal RVs), the computation of such a convolution is quite fast.
(B) Characteristic function:
X will be approximated with Y=X+Z, where Z is normal N(0,\sigma) with small sigma. Y has a density (which it is impossible to compute directly) but the characteristic function (continuous Fourier transform) cf_Y of Y can easily be computed analytically (without knowing the density of Y)
Now let s be a numeric vector. I want to get the density f_Y(s) of Y evaluated along s. The proper way of doing this would be to apply the inverse continuous Fourier transform to the function cf_Y at each point in s.
This is far too slow. That's why I tried to apply the inverse discrete Fourier transform to the vector of values cf_Y(s) and that does not yield anything reasonable.
This baffles me since I was under the impression that the discrete Fourier transform is an approximation to the continuous Fourier transform and so should yield the values of the latter, if the value grid s is fine enough.
Should this work?
Note that this inversion would definitely work if I could compute the discrete Fourier transform of the density f_Y along s, but regrettably this is not possible since
(a) the density of f_Y is far too complicated, and
(b) the discrete Fourier transform of a sum Y = Y_1+Y_2+...+Y_k of independent random variables Y_j is not the product of the discrete Fourier transforms of the Y_j.
How can I approach this problem?
Here is a small example that shows that the convolution fails quickly due to the oscillatory nature of the densities involved:
library(bayesmeta)
# density of $(1/10)*\sum_{j=1}{10}N(j,0.01$
# (convex sum of normal distributions)
#
f <- Vectorize(function(s) sum(vapply(1:10,
FUN = function(j) dnorm(s,mean=j,sd=0.01)/10, FUN.VALUE=0
)))
g <- function(s) dnorm(s,mean=0,sd=0.01)
cat("\n\n")
for(i in 1:5){
cat("Doing convolution ",i,"\n")
g <- convolve(g,f)$density
}
cat("\nConvolutions finished, plotting density.")
s <- seq(0,100,length.out=1024)
matplot(s,g(s),type="l")
One possibility is to use the packages discreteRV and distr. With discreteRV, you can get the distribution of the sum of independent random variables:
library(discreteRV)
X1 <- RV(c(0, 1, 2), c(0.5, 0.25, 0.25))
X2 <- RV(c(1, 2), c(0.5, 0.5))
X3 <- RV(c(0, 1, 2, 3), c(0.4, 0.2, 0.2, 0.2))
sumXi <- SofI(X1, X2, X3)
Now you can use distr to compute the quantiles:
library(distr)
D <- DiscreteDistribution(
supp = attr(sumXi, "outcomes"),
prob = attr(sumXi, "probs")
)
# quantile 0.5:
q.l(D)(0.5)