Let x_1, ..., x_i,..., x_p be p real numbers such that 0 <= x_i <= b for all i. That is, each x_i can take any value between 0 and b. I'd like to find the values of {x_i}'s that maximize the variance among them.
Do you have any hints? I'd like to use this result for my example code. Or, isn't this question well-defined?
At first I thought of something like x_1=0, x_2=...=x_p=b, but then I found this does not maximize the variance when p is a little bit large.
Thanks
After comments, I did some trials on a numerical prove for your problem. There's still some work to do, but I hope it puts you in the right track. Besides, I've used python
, and I have no idea if this is ok for you or not. You can surely find equivalent ways to do it in matlab
and R
.
I use the well-known property of the variance = E[X^2] - E[X]^2, to make the derivatives easier. (If you have doubts, check wiki).
The python
package scipy.optimize
has a method minimize
to minimize numerically a function. You can select an algorithm for solving the problem; I'm not so familiar with the possible algorithms and I was looking for the well-known plain gradient descent (well, at least I hope you know it), and I think a closed one could be SLSQP, but honestly I'm not 100% sure on the details.
And finally, I didn't make sure that the function you're minimizing is convex, or figured out whether it has local minima, but the results look fine.
I give you the code in python below, in case it is useful, but the bottomline is that I'd suggest you:
Code below. Hope it helps.
I'm not going to post the algebra for the derivatives, I hope you can make them yourself. And you must take into account that you are maximizing and not minimizing, so you have to multiply by -1, as explained I hope quite clearly here (look for "maximizing").
Setup,
In [1]:
from scipy.optimize import minimize
import numpy as np
The function you are maximizing, that is the variance (remember the the trick E[X^2] - E[X]^2, and the -1),
In [86]:
def func(x):
return (-1) * (np.mean([xi**2 for xi in x]) - np.mean(x)**2)
The derivative of that function for each of the xi
of the vector x
, (I hope you can derivate and get to the same result),
In [87]:
def func_deriv(x):
n = len(x)
l = []
for i in range(n):
res = (2 * x[i] / n) - ((2/(n**2)) * (x[i] + sum([x[j] for j in range(n) if j != i])))
l += [(-1) * res]
return np.array(l)
Actually, I made quite a few mistakes when writing this function, both in the derivative and the python implementation. But there is a trick that helps a lot, which is to check the derivative in a numeric way, by adding and subtracting a small epsilon in every dimension and calculating the slope of the curve see wiki. This would the function that approximates the derivative,
In [72]:
def func_deriv_approx(x, epsilon=0.00001):
l = []
for i in range(len(x)):
x_plus = [x[j]+((j == i)*epsilon) for j in range(len(x))]
x_minus = [x[j]-((j == i)*epsilon) for j in range(len(x))]
res = (-1) * (func(x_plus) - func(x_minus)) / (2*epsilon)
l += [res]
return l
And then I've checked func_deriv_approx
versus func_deriv
for a bunch of values.
And the minimizing itself. If I initialize the values to the solution we suspect is right, it works ok, it only iterates once and gives the expected results,
In [99]:
res = minimize(func, [0, 0, 10, 10], jac=func_deriv, bounds=[(0,10) for i in range(4)],
method='SLSQP', options={'disp': True})
Optimization terminated successfully. (Exit mode 0)
Current function value: -25.0
Iterations: 1
Function evaluations: 1
Gradient evaluations: 1
In [100]:
print(res.x)
[ 0. 0. 10. 10.]
(Note that you could use the length you wanted, since func
and func_deriv
are written in a way that accept any length).
You could initialize randomly like this,
In [81]:
import random
xinit = [random.randint(0, 10) for i in range(4)]
In [82]:
xinit
Out[82]:
[1, 2, 8, 7]
And then the maximization is,
In [83]:
res = minimize(func, xinit, jac=func_deriv, bounds=[(0,10) for i in range(4)],
method='SLSQP', options={'disp': True})
Optimization terminated successfully. (Exit mode 0)
Current function value: -25.0
Iterations: 3
Function evaluations: 3
Gradient evaluations: 3
In [84]:
print(res.x)
[ 1.27087156e-13 1.13797860e-13 1.00000000e+01 1.00000000e+01]
Or finally for length = 100,
In [85]:
import random
xinit = [random.randint(0, 10) for i in range(100)]
In [91]:
res = minimize(func, xinit, jac=func_deriv, bounds=[(0,10) for i in range(100)],
method='SLSQP', options={'disp': True})
Optimization terminated successfully. (Exit mode 0)
Current function value: -24.91
Iterations: 23
Function evaluations: 22
Gradient evaluations: 22
In [92]:
print(res.x)
[ 2.49143492e-16 1.00000000e+01 1.00000000e+01 -2.22962789e-16
-3.67692105e-17 1.00000000e+01 -8.83129256e-17 1.00000000e+01
7.41356521e-17 3.45804774e-17 -8.88402036e-17 1.31576404e-16
1.00000000e+01 1.00000000e+01 1.00000000e+01 1.00000000e+01
-3.81854094e-17 1.00000000e+01 1.25586928e-16 1.09703896e-16
-5.13701064e-17 9.47426071e-17 1.00000000e+01 1.00000000e+01
2.06912944e-17 1.00000000e+01 1.00000000e+01 1.00000000e+01
-5.95921560e-17 1.00000000e+01 1.94905365e-16 1.00000000e+01
-1.17250430e-16 1.32482359e-16 4.42735651e-17 1.00000000e+01
-2.07352528e-18 6.31602823e-17 -1.20809001e-17 1.00000000e+01
8.82956806e-17 1.00000000e+01 1.00000000e+01 1.00000000e+01
1.00000000e+01 1.00000000e+01 3.29717355e-16 1.00000000e+01
1.00000000e+01 1.00000000e+01 1.00000000e+01 1.00000000e+01
1.43180544e-16 1.00000000e+01 1.00000000e+01 1.00000000e+01
1.00000000e+01 1.00000000e+01 2.31039883e-17 1.06524134e-16
1.00000000e+01 1.00000000e+01 1.00000000e+01 1.00000000e+01
1.77002357e-16 1.52683194e-16 7.31516095e-17 1.00000000e+01
1.00000000e+01 3.07596508e-17 1.17683979e-16 -6.31665821e-17
1.00000000e+01 2.04530928e-16 1.00276075e-16 -1.20572493e-17
-3.84144993e-17 6.74420338e-17 1.00000000e+01 1.00000000e+01
-9.66066818e-17 1.00000000e+01 7.47080743e-17 4.82924982e-17
1.00000000e+01 -9.42773478e-17 1.00000000e+01 1.00000000e+01
1.00000000e+01 1.00000000e+01 1.00000000e+01 5.01810185e-17
-1.75162038e-17 1.00000000e+01 6.00111991e-17 1.00000000e+01
1.00000000e+01 7.62548028e-17 -6.90706135e-17 1.00000000e+01]