I'm trying to maximise a complex function with a Grid Search over a hypercube. I first tried using Numpy's meshgrid to generate all the function arguments and creating indices to it with python itertools' product. Unfortunately the result is very (very) slow. I realise that I can't expect much in terms of speed from a grid search, and an "efficient grid search" above D 3 or 4 might be a bit of contradiction in terms, but I thought I might speed the process up a bit by writing it in Julia. This should help at least due to the fact that my implementation of the function I'm actually trying to maximise with this grid search is significantly faster in Julia. What would be the most efficient way to do this?
Here is the simplest code that does a 3-dimensional grid (a Cartesian product of 3 parameters).
The computation is executed over a Julia cluster with 4 processes (you can adjust to whatever you have on your machine( and the results are collected to a DataFrame
.
using Distributed
# Adds 4 workers (and avoids adding moreif e.g. if rerunning Jupyter cell
addprocs(max(0, (4+1)-nprocs()))
@everywhere using Distributed, Random, DataFrames
@everywhere Random.seed!(myid())
@everywhere function your_computation(i, j, k)
# do your complicated computation for your grid search here
3i + 2j + k
end
data = @distributed (append!) for (i, j, k) = vec(collect(Iterators.product(1:4, 1:3, 1:2)))
c = calc(i, j, k)
DataFrame(;i,j,k,c,procid = myid())
end
Notes:
i
, j
and k
,Threads.@threads
- generally multiprocessing is more scalable though. In a multithreaded you would need to use locking when appending the data frame. Should you need a multithreaded version of this code (instead of multiprocessing) let me know.addprocs
to add processes to the Julia cluster. Combined with ClusterManagers.jl
this can be used to add processes on remote machines. The biggest Julia cluster I have been running with a similar loop was 100 modes/servers with 8000 logical threads.