I am stuck with a problem that at first seems to be a no problem at all.
I would like to write a function, which would take a distribution name and parameters as arguments, and return the expected value of the given distribution. In my code, I will use Beta distribution with parameters alpha = 0.2 and beta = 0.3. The expected value of this distribution is: 0.2 / (0.2 + 0.3) = 0.2 / 0.5 = 0.4
The easiest way seems to be sampling:
estimatedExpectedValue <- function(distribution, parameters) {
do.call(get(paste0("r",distribution)), as.list(c(10000, parameters))) %>% mean
}
estimatedExpectedValue("beta",c(0.2,0.3))
[1] 0.4006945
While, this is easy to implement, the sample mean is "only" the maximum likelihood estimate of the expected value, and therefore it always has some error. It depends on the context, whether this matters or not.
Then we also know that the definition of the expected value is:
...where f(x) is the density function, and in R we can integrate numerically, so:
expectedValueFromIntegration <- function(distribution, parameters) {
fnToBeIntegrated <- Vectorize(
function(x) {x * do.call(get(paste0("d",distribution)), as.list(c(x, parameters)))},
vectorize.args = c("x"))
integrate(fnToBeIntegrated, lower = -Inf, upper = Inf)
}
expectedValueFromIntegration("beta",c(0.2,0.3))
Error in integrate(fnToBeIntegrated, lower = -Inf, upper = Inf) :
non-finite function value
But, as you can see, for some reason dbeta gave out NaN at some point, and integrate function did not like it. I am well aware that the support for the Beta distribution starts from 0 and ends at 1, but the density should be 0 outside the support, and mathematically everything should be just fine for any distribution (even for a discrete distribution), if we just integrated over the real axis.
But in R, however, if I integrate over the support, it works:
expectedValueFromIntegration <- function(distribution, parameters) {
fnToBeIntegrated <- Vectorize(
function(x) {x * do.call(get(paste0("d",distribution)), as.list(c(x, parameters)))},
vectorize.args = c("x"))
integrate(fnToBeIntegrated, lower = 0, upper = 1)
}
expectedValueFromIntegration("beta",c(0.2,0.3))
0.4 with absolute error < 1.9e-05
But, now I need to know (i.e. input) the support for the distribution, or at least create a lookup table of some kind, since to my understanding R has this support information for different distributions stored nowhere, am I right?
Is there a third way to do this? I am looking a general way of doing this, since I prefer not to do ifelse-structure, and using a different approach for each and every distribution.
If you need the support of distribution, you could use q*
(quantile function) to solve the support by specifying c(0,1)
as the quantiles.
Below are just some examples to give you the idea, and you can customize it according to your coding style
# support of beta distribution
> qbeta(0:1, 0.2, 0.3)
[1] 0 1
# support of gamma distribution
> qgamma(0:1, 0.2)
[1] 0 Inf
# support of norm distribution
> qnorm(0:1)
[1] -Inf Inf