Suppose I have a vector of positive weights a=(a1, a2, a3, a4)
such that a2=a3
and a1+a2+a3+a4=1
. Is there any way to sample this kind of weights using R? I tried to think about using the Dirichlet distribution, but it gives no mechanism to force two of the variates to be equal.
To sample evenly across the set {(a1, a2, a3, a4 | a2=a3, a1+a2+a3+a4=1, a1>0, a2>0, a3>0, a4>0}
, I would first sample the value for a2
(which is equal to a3
). To do this we need to know the distribution of this value. If a2 = a3 = r
, then we have a1+a4 = 1-2r
; for positive a1 and a4 there is a line segment of length (1-2k)*sqrt(2)
containing all feasible values of a1
and a4
. Integrating, the probability that a2
is k
or less is 4(k - k^2)
. In more detail:
Prob (a2 <= k) = Integral(0 to k) (1-2r)*sqrt(2) dr / Integral(0 to 0.5) (1-2r)*sqrt(2) dr
= ((k-k^2)*sqrt(2)) / (sqrt(2)/4)
= 4k - 4k^2
Thus, we can sample values for a2
by selecting a uniformly distributed value u~U(0, 1)
and setting a2
to equal the value k
for which 4k - 4k^2 = u
. Solving via the quadratic formula, this yields:
a2 = 0.5 * (1 - sqrt(1-u))
In R, we can sample 1000 values for a2
with:
set.seed(144)
a2 <- 0.5 * (1 - sqrt(1 - runif(1000)))
a3 <- a2
Given a fixed value a2 = a3 = k
, the value of a1
is uniformly distributed in [0, 1-2k]
:
a1 <- runif(1000) * (1 - 2*a2)
Having specified a1
, a2
, and a3
, there is only one possible value for a4
:
a4 <- 1 - a1 - a2 - a3
We can take a look at some of our sampled values:
head(cbind(a1, a2, a2, a4))
# a1 a2 a2 a4
# [1,] 0.83455239 0.01251016 0.01251016 0.14042729
# [2,] 0.02744599 0.22932773 0.22932773 0.51389856
# [3,] 0.45835472 0.23860119 0.23860119 0.06444291
# [4,] 0.36843649 0.14679703 0.14679703 0.33796946
# [5,] 0.35109881 0.08702039 0.08702039 0.47486041
# [6,] 0.02916818 0.19942616 0.19942616 0.57197949
Here is the distribution of a1
values (note that by symmetry this is identical to the distribution of the a4
values). Because we select a1
uniformly in the range [0, 1-2*a2]
, lower values are more common than higher values:
Here is the distribution of a2
values (by definition this is the same as the distribution of the a3
values). The shape of the distribution is similar to the one of a1
, but the maximum value is 0.5: