performance optimization julia differential-equations stochastic-process

Julia Stochastic Differential Equation Very Slow

Please help me figure out why my code is sooo slow and possible ways to speed it up. I have used vectorization, @inbounds, column major indexing, @floop, and preallocation so I would think it would be faster. I am at a loss...

The code simulates a stochastic wave of cells (as in biological cells) and mutant cells. I am using the Euler-Murayama method to propagate the coupled (Chemical Langevin) equations:

Where W/M denotes the number of wild-type cells and mutant cells respectively, K is the carrying capacity (maximum number of cells), i denotes the deme (location), and N(0,1) is a normal random variable.

Here is a graphic of the waves after some time propagating:

I have attached the important part of the code below, and tried to comment it as best as I could.

using Random, Distributions
using StatsBase
using Statistics
using FLoops

# CLE Parameters/Set-Up:
K = 100 # Carrying capacity (maximum number of cells)
M = 100 # Number of demes (locations)
T = 100_000_000 # number of time steps
dt = 1e-1 # time increment
g = Normal(0.0,sqrt(dt)) # normal distribution with ave = 0.0, std_dev = dt
r_w = 0.1 # Wild-type growth rate
r_m = 0.2 # Mutant growth rate
r_wm = [r_w, r_m]' # Growth rate vector (transposed)
N = 1 # number of independent processes (slow even when N = 1)

# initial wave (essentially a step-function of wild-types 
# with 100 mutants at deme (location) 76)
state_init = Matrix(reshape(repeat([K, 0.0]',M+2),(M+2,2))) 
state_init[M÷2+2:end,1] .= 0   
state_init[76,2] = 100.0
state_init[1,:] .= [K,0]
state_init[end,:] .= [0,0]

state = deepcopy(state_init)
state_plus = zeros(size(state_init)) # state at demes i+1 instead of i (used for derivatives)
state_minus = zeros(size(state_init)) # state at demes i-1 instead of i (used for derivatives)

function sim!(state_init::Matrix{Float64}, state::Matrix{Float64},
              T::Int64, dt::Float64, N::Int64, M::Int64, K::Int64,
              hist_data::Array{Int64,3}, g::Normal{Float64})

    @inbounds @floop for n in 1:N
        state .= deepcopy(state_init) # initialize state
        @inbounds for t in 1:T
            
            state_plus .= circshift(state, -1) # make plus state
            state_plus[1,:] .= [0,0] # fix boundary conditions
            state_minus .= circshift(state, 1) # make minus state
            state_minus[end,:] .= [0,0] # fix boundary conditions
            state_shift = circshift(state, (0,1)) 
            # make state where each deme has a vector 
            # (# mutants, # wild-types) instead of 
            # (# wild-types, # mutants)

            ######################################
            # propagate state using Euler-Murayama method and
            # restrict number of cells in a deme to be in the range [0,K]
            # using clamp(). clamp() also prevents imaginary numbers from 
            # a negative number under the sqrt().

            state .= clamp.(state .+ 
            dt .* (r_wm .* state .* (K .- state .- state_shift) .+ 
            K .* (state_plus .- 2.0 .* state .+ state_minus)) .+ 
            sqrt.(clamp.(
            6 .* state .* (K .- state) .+ 
            (K .- 2.0 .* state) .* (state_plus .- 2.0 .* state .+ 
            state_minus) .- r_wm .* state .* (K .- state .- state_shift), 
            0.0, 1.0*K*K)) .* 
            rand(g,M+2), 0.0, 1.0*K) 
            ######################################

        end
    end
end

sim!(state_init, state, T, dt, N, M, K, hist_data, g)


... the rest of the code is analysis and not the reason the code is slow.

Solution

Even though you're pre-allocating state_plus and state_minus, memory is being allocated inside the for loop since you're using circshift instead of circshift!. circshift allocates new memory regardless of the fact that it's ultimately being assigned to an existing pre-allocated array. Doing that allocation 300 million times is bound to be costly!

Try

function sim!(state_init::Matrix{Float64}, state::Matrix{Float64},
              T::Int64, dt::Float64, N::Int64, M::Int64, K::Int64,
              hist_data::Array{Int64,3}, g::Normal{Float64})

    @inbounds @floop for n in 1:N
        state .= deepcopy(state_init) # initialize state
        state_plus = zeros(size(state_init)) # state at demes i+1 instead of i (used for derivatives)
        state_minus = zeros(size(state_init)) # state at demes i-1 instead of i (used for derivatives)
        state_shift = zeros(size(state_init)) 

        for t in 1:T
            
            circshift!(state_plus, state, -1) # make plus state
            state_plus[1,:] .= [0,0] # fix boundary conditions
            circshift!(state_minus, state, 1) # make minus state
            state_minus[end,:] .= [0,0] # fix boundary conditions
            circshift!(state_shift, state, (0,1))

and

function sim!(state_init::Matrix{Float64}, state::Matrix{Float64},
              T::Int64, dt::Float64, N::Int64, M::Int64, K::Int64,
              hist_data::Array{Int64,3}, g::Normal{Float64})

    state_plus = zeros(size(state_init)) # state at demes i+1 instead of i (used for derivatives)
    state_minus = zeros(size(state_init)) # state at demes i-1 instead of i (used for derivatives)
    state_shift = zeros(size(state_init)) 
    @inbounds for n in 1:N
        state .= deepcopy(state_init) # initialize state
        for t in 1:T
            
            circshift!(state_plus, state, -1) # make plus state
            state_plus[1,:] .= [0,0] # fix boundary conditions
            circshift!(state_minus, state, 1) # make minus state
            state_minus[end,:] .= [0,0] # fix boundary conditions
            circshift!(state_shift, state, (0,1))

The second version doesn't use @floop, but that allows it to not have to initialize state_plus and others N times, so it may be the case that it's faster for your actual N. Best to try both and find out!