I'm trying to solve a two-dimensional Ising model with a Monte Carlo approach.
As it is slow, I used Cython to accelerate the code execution. I would like to push it even further and parallelize the Cython code. My idea is to split the two-dimensional lattice in two, so for any point on a lattice has its nearest neighbours on the other lattice. This way, I can randomly choose one lattice, and I can flip all the spins and this could be done in parallel since all those spins are independent.
So far this is my code:
(inspired from http://jakevdp.github.io/blog/2017/12/11/live-coding-cython-ising-model/)
%load_ext Cython
%%cython
cimport cython
cimport numpy as np
import numpy as np
from cython.parallel cimport prange
@cython.boundscheck(False)
@cython.wraparound(False)
def cy_ising_step(np.int64_t[:, :] field,float beta):
cdef int N = field.shape[0]
cdef int M = field.shape[1]
cdef int offset = np.random.randint(0,2)
cdef np.int64_t[:,] n_update = np.arange(offset,N,2,dtype=np.int64)
cdef int m,n,i,j
for m in prange(M,nogil=True):
i = m % 2
for j in range(n_update.shape[0]) :
n = n_update[j]
cy_spin_flip(field,(n+i) %N,m%M,beta)
return np.array(field,dtype=np.int64)
cdef cy_spin_flip(np.int64_t[:, :] field,int n,int m, float beta=0.4,float J=1.0):
cdef int N = field.shape[0]
cdef int M = field.shape[1]
cdef float dE = 2*J*field[n,m]*(field[(n-1)%N,m]+field[(n+1)%N,m]+field[n,(m-1)%M]+field[n,(m+1)%M])
if dE <= 0 :
field[n,m] *= -1
elif np.exp(-dE * beta) > np.random.rand():
field[n,m] *= -1
I tried using a prange
-constructor, but I'm having a lots of troubles with GIL-lock. I’m new to Cython and parallel computing so I could easily have missed something.
The error:
Discarding owned Python object not allowed without gil
Calling gil-requiring function not allowed without gil
From a Cython point-of-view the main problem is that cy_spin_flip
requires the GIL. You need to add nogil
to the end of its signature, and set the return type to void
(since by default it returns a Python object, which requires the GIL).
However, np.exp
and np.random.rand
also require the GIL, because they're Python function calls. np.exp
is probably easily replaced with libc.math.exp
. np.random
is a bit harder, but there's plenty of suggestions for C- and C++-based approaches: 1 2 3 4 (+ others).
A more fundamental problem is the line:
cdef float dE = 2*J*field[n,m]*(field[(n-1)%N,m]+field[(n+1)%N,m]+field[n,(m-1)%M]+field[n,(m+1)%M])
You've parallelized this with respect to m
(i.e. different values of m
are run in different threads), and each iteration changes field
. However in this line you are looking up several different values of m
. This means the whole thing is a race-condition (the result depends on which order the different threads finish) and suggests your algorithm may be fundamentally unsuitable for parallelization. Or that you should copy field
and have field_in
and field_out
. It isn't obvious to me, but this is something that you should be able to work out.
Edit: it does look like you've given the race condition some thought with using i%2
. It isn't obvious to me that this is right though. I think a working implementation of your "alternate cells" scheme would look something like:
for oddeven in range(2):
for m in prange(M):
for n in range(N):
# some mechanism to pick the alternate cells here.
i.e. you need a regular loop to pick the alternate cells outside your parallel loop.