I have the following coupled system of ODEs (that come from discretizing an integrodifferential PDE):
The xi's are points on an x-grid that I control. I can solve this with the following simple piece of code:
using DifferentialEquations
function ode_syst(du,u,p, t)
N = Int64(p[1])
beta= p[2]
deltax = 1/(N+1)
xs = [deltax*i for i in 1:N]
for j in 1:N
du[j] = -xs[j]^(beta)*u[j]+deltax*sum([u[i]*xs[i]^(beta) for i in 1:N])
end
end
N = 1000
u0 = ones(N)
beta = 2.0
p = [N, beta]
tspan = (0.0, 10^3);
prob = ODEProblem(ode_syst,u0,tspan,p);
sol = solve(prob);
However, as I make my grid finer, i.e. increase N, the computation time grows rapidly (I guess the scaling is quadratic in N). Is there any suggestion on how to implement this using either distributed parallelism or multithreading?
Additional information: I attach the profiling diagram that might be useful to understand where the program spends most of the time
I've looked into your code and found a few problems such as an accidentally introduced O(N^2) behavior due to recalculating the sum term.
My improved version uses the Tullio package to get further speed up from vectorization. Tullio also has tuneable parameters which would allow for multi threading if your system becomes big enough. See here what parameters you can tune in the options section. You might also see GPU support there, i didn't test that but it might yield further speed up or break hooribly. I also choose get the length from the acutal array which should make the use more economical and less error prone.
using Tullio
function ode_syst_t(du,u,p, t)
N = length(du)
beta = p[1]
deltax = 1/(N+1)
@tullio s := deltax*(u[i]*(i*deltax)^(beta))
@tullio du[j] = -(j*deltax)^(beta)*u[j] + s
return nothing
end
Your code:
@btime sol = solve(prob);
80.592 s (1349001 allocations: 10.22 GiB)
My code:
prob2 = ODEProblem(ode_syst_t,u0,tspan,[2.0]);
@btime sol2 = solve(prob2);
1.171 s (696 allocations: 18.50 MiB)
and the result more or less agree:
julia> sum(abs2, sol2(1000.0) .- sol(1000.0))
1.079046922815598e-14
Lutz Lehmanns solution i also benchmarked:
prob3 = ODEProblem(ode_syst_lehm,u0,tspan,p);
@btime sol3 = solve(prob3);
1.338 s (3348 allocations: 39.38 MiB)
However as we scale N to 1000000 with a tspan of (0.0, 10.0)
prob2 = ODEProblem(ode_syst_t,u0,tspan,[2.0]);
@time solve(prob2);
2.429239 seconds (280 allocations: 869.768 MiB, 13.60% gc time)
prob3 = ODEProblem(ode_syst_lehm,u0,tspan,p);
@time solve(prob3);
5.961889 seconds (580 allocations: 1.967 GiB, 11.08% gc time)
My code becomes more than twice as fast due to using the 2 cores in my old and rusty machine.