algorithmmachine-learningoptimizationdata-sciencesimplex

Finding optimal solution to multivariable function with non-negligible solution time?


So I have this issue where I have to find the best distribution that, when passed through a function, matches a known surface. I have written a script that creates the distribution given some parameters and spits out a metric that compares the given surface to the known, but this script takes a non-negligible time, so I can't just run through a very large set of parameters to find the optimal set of parameters. I looked into the simplex method, and it seems to be the right path, but its not quite what I need, because I dont exactly have a set of linear equations, and dont know the constraints for the parameters, but rather one method that gives a single output (an thats all). Can anyone point me in the right direction to how to solve this problem? Thanks!

To quickly go over my process / problem again, I have a set of parameters (at this point 2 but will be expanded to more later) that defines a distribution. This distribution is used to create a surface, which is compared to a known surface, and an error metric is produced. I want to find the optimal set of parameters, but cannot run through an arbitrarily large number of parameters due to the time constraint.


Solution

  • One situation consistent with what you have asked is a model in which you have a reasonably tractable probability distribution which generates an unknown value. This unknown value goes through a complex and not mathematically nice process and generates an observation. Your surface corresponds to the observed probability distribution on the observations. You would be happy finding the parameters that give a good least squares fit between the theoretical and real life surface distribution.

    One approximation for the fitting process is that you compute a grid of values in the space output by the probability distribution. Each set of parameters gives you a probability for each point on this grid. The not nice process maps each grid point here to a nearest grid point in the space of the surface. The least squares fit is a quadratic in the probabilities calculated for the first grid, because the probabilities calculated for a grid point in the surface are the sums of the probabilities calculated for values in the first grid that map to something nearer to that point in the surface than any other point in the surface. This means that it has first (and even second) derivatives that you can calculate. If your probability distribution is nice enough you can use the chain rule to calculate derivatives for the least squares fit in the initial parameters. This means that you can use optimization methods to calculate the best fit parameters which require not just a means to calculate the function to be optimized but also its derivatives, and these are generally more efficient than optimization methods which require only function values, such as Nelder-Mead or Torczon Simplex. See e.g. http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/optim/package-summary.html.

    Another possible approach is via something called the EM Algorithm. Here EM stands for Expectation-Maximization. It can be used for finding maximum likelihood fits in cases where the problem would be easy if you could see some hidden state that you cannot actually see. In this case the output produced by the initial distribution might be such a hidden state. One starting point is http://www-prima.imag.fr/jlc/Courses/2002/ENSI2.RNRF/EM-tutorial.pdf.