rstatisticsprobabilitystochastic

"One after the other" realisation of discrete random variables


I'm stuck with the following problem:

There are given n+1 discrete random variables:

X = {1,...,n} with P(x=i) = p_i
Y_i = {1,...,n_i} with P(y_i = j) = p_ij and i = 1,...,n

We do the following:

  1. We draw from X and the result determines which Y_i we choose for the next step: If x = a, we use Y_a.
  2. We draw from this Y_a.

Now my questions to this:

  1. How do I get the Expected Value and the Variance of the whole?
  2. Can this "process" be defined by a single random variable?
  3. Assume we only know the EV and Var of all Y_i, but not all (or even none) of the probabilities. Can we still calculate the EV and Var of the whole process?
  4. If 2) can be done, how to do this efficiently in R?

To give you an example of what I've tried:

X = {1,2} with P(x = 1) = 0.3 and P(x = 2) = 0.7
Y_1 = {2,3} with P(y_1 = 1) = 0.5 and P(y_1 = 3) = 0.5
Y_2 = {1,5,20} with P(y_2 = 1) = 0.3, P(y_2 = 5) = 0.6 and P(y_2 = 20) = 0.1

I have tried to combine those to a single random variable Z, but I'm not sure, if that can be done that way:

Z = {2,3,1,5,20} with probabilities (0.5*0.3, 0.5*0.3, 0.3*0.7, 0.6*0.7, 0.1*0.7)

The weighted EV is correct, but the "weighted" Var is different - if it is correct to use the formula for Var of linear combination for independent random variables. (Maybe just the formula for the combined Var is wrong.)

I used R and the package "discreteRV":

install.packages("discreteRV")
library(discreteRV)

#defining the RVs
Y_1 <- RV(outcomes = c(2, 3), probs = c(0.5, 0.5)) #occures 30% of the time
Y_2 <- RV(outcomes = c(1, 5, 20), probs = c(0.3, 0.6, 0.1)) #occures 70% of the time

Z <- RV(outcomes = c(2, 3, 1, 5, 20), 
        probs = c(0.5*0.3, 0.5*0.3, 0.3*0.7, 0.6*0.7, 0.1*0.7))


#calculating the EVs
E(Z)
E(Y_1)*0.3 + E(Y_2)*0.7

#calculating the VARs
V(Z)
V(Y_1)*(0.3)^2 + V(Y_2)*(0.7)^2

Thank you for your help.


Solution

  • Actually Z has a larger sample space expanded by Y1 and Y2, which is not a linear superposition of two components. In other words, we should present Z like Z = [0.3*Y1, 0.7*Y2] rather than Z = 0.3*Y1 + 0.7*Y2.

    Since we have

    V(Z) = E(Z**2)-E(Z)**2

    > E(Z**2) -E(Z)**2
    [1] 20.7684
    
    > V(Z)
    [1] 20.7684
    

    We will easily find that in the term E(Z)**2, there are cross-product terms between Y1 and Y2, which makes V(Z) != V(Y_1)*(0.3)^2 + V(Y_2)*(0.7)^2.