optimizationcudagenetic-algorithmnonlinear-optimizationcub

Cost functional calculation for global optimization in CUDA


I am trying to optimize a function (say find the minimum) with n parameters (Xn). All Xi's are bound to a certain range (for example -200 to 200) and if any parameter leaves this range, the function goes to infinity very fast. However, n can be large (from 20 to about 60-70) and computing it's value takes long time.

I don't think the details about the function are of big relevance, but here are some: it consists of a weighted sum of 20-30 smaller functions (all different), which on their part consist of sums of dot products under the sign of an inverse sinusoidal function (arcsin, arccos, arctan, etc). Something like arcsin(X1 . X2) + arcsin(X4 . X7) + ....

The function has many local minima in general, so approaches such as (naive) conjugated gradients or quasi-Newton are useless. Searching the entire domain brute force is too slow.

My initial idea was to use some sort of massive parallelization in combination with a genetic algorithm, which performs many searches on different spots in the domain of the function, and at regular intervals checks whether some of the searches reached local minima. If yes, it compares them and discards all results but the smallest one, and continues the search until a reasonably small value is found.

My two questions are:

1) Is it possible to implement this problem in CUDA or a similar technology? Can CUDA compute the value of a function like this fast enough?

2) Would be better/faster to implement the problem on a multicore PC (with 12+ cores)?


Solution

  • It is certainly possible to implement global optimization algorithms on GPUs, but you may have to modify the algorithms to get the best performance. I have personally implemented about a half dozen population-based metaheuristics on GPUs, and it is possible to get excellent speedups relative to a multi-threaded CPU.

    With population-based algorithms iterated over generations, you may find that the population size becomes the limiting factor in your performance. If your objective function is not easily parallelizable, then the maximum number of parallel threads is generally the number of candidate solutions in the population. GPUs work best with, at minimum, tens of thousands of simultaneous threads, which is not always practical for population-based algorithms due to population inertia and other effects.

    You can get around the population limitations to some extent by running multiple instances of the optimization in parallel, each starting with a different random seed. Further, you can arrange for the parallel instances to communicate over some network topology (this would be an island model algorithm), so that the parallel instances can collaboratively evolve solutions to complex problems. I have actually implemented this in OpenCL for my MScE thesis.

    So overall, here are succinct answers to your questions:

    1) Yes, you can implement this in CUDA or a similar technology. Your speedup will depend on how parallelizable your objective function is and how many parallel instances you want to run at the same time. 2) It would almost certainly be faster to implement on a CPU, due to a wider range of existing libraries and the fact that CPU programming models are conceptually simpler than GPU models. Whether or not it is "better" depends on what you value.

    This is still an area of active research, and it will probably take you longer to build a working GPU implementation than it would take for a multicore CPU implementation. If implementation time is your primary concern, I would like to recommend that you take a look at the PaGMO project (and its Python bindings PyGMO), which is an excellent implementation of an island model optimizer with a wide range of local and global optimization functions included. The islands can be assigned any arbitrary algorithm to run independently, and you can specify exactly how they communicate to collaboratively optimize your objective function.

    http://sourceforge.net/apps/mediawiki/pagmo/index.php?title=Main_Page

    Your question is well within my research area, and I would be happy to help you further if you need it.