pythonnumpyscipyvectorizationbessel-functions

Vectorized spherical bessel functions in python?


I noticed that scipy.special Bessel functions of order n and argument x jv(n,x) are vectorized in x:

In [14]: import scipy.special as sp In [16]: sp.jv(1, range(3)) # n=1, [x=0,1,2] Out[16]: array([ 0., 0.44005059, 0.57672481])

But there's no corresponding vectorized form of the spherical Bessel functions, sp.sph_jn:

In [19]: sp.sph_jn(1,range(3)) 

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-ea59d2f45497> in <module>()
----> 1 sp.sph_jn(1,range(3)) #n=1, 3 value array

/home/glue/anaconda/envs/fibersim/lib/python2.7/site-packages/scipy/special/basic.pyc in sph_jn(n, z)
    262     """
    263     if not (isscalar(n) and isscalar(z)):
--> 264         raise ValueError("arguments must be scalars.")
    265     if (n != floor(n)) or (n < 0):
    266         raise ValueError("n must be a non-negative integer.")

ValueError: arguments must be scalars.

Furthermore, the spherical Bessel function compute all orders of N in one pass. So if I wanted the n=5 Bessel function for argument x=10, it returns n=1,2,3,4,5. It actually returns jn and its derivative in one pass:

In [21]: sp.sph_jn(5,10)
Out[21]: 
(array([-0.05440211,  0.07846694,  0.07794219, -0.03949584, -0.10558929,
        -0.05553451]),
 array([-0.07846694, -0.0700955 ,  0.05508428,  0.09374053,  0.0132988 ,
        -0.07226858]))

Why does this asymmetry exist in the API, and does anyone know of a library that will return spherical Bessel functions vectorized, or at least more quickly (ie in cython)?


Solution

  • You can write a cython function to speedup the calculation, the first thing you must to do is to get the address of the fortran function SPHJ, here is how to do this in Python:

    from scipy import special as sp
    sphj = sp.specfun.sphj
    
    import ctypes
    addr = ctypes.pythonapi.PyCObject_AsVoidPtr(ctypes.py_object(sphj._cpointer))
    

    Then you can call the fortran function directly in Cython, note I use prange() to use multicore to speedup the calculation:

    %%cython -c-Ofast -c-fopenmp --link-args=-fopenmp
    from cpython.mem cimport PyMem_Malloc, PyMem_Free
    from cython.parallel import prange
    import numpy as np
    import cython
    from cpython cimport PyCObject_AsVoidPtr
    from scipy import special
    
    ctypedef void (*sphj_ptr) (const int *n, const double *x, 
                                const int *nm, const double *sj, const double *dj) nogil
    
    cdef sphj_ptr _sphj=<sphj_ptr>PyCObject_AsVoidPtr(special.specfun.sphj._cpointer)
    
    
    @cython.wraparound(False)
    @cython.boundscheck(False)
    def cython_sphj2(int n, double[::1] x):
        cdef int count = x.shape[0]
        cdef double * sj = <double *>PyMem_Malloc(count * sizeof(double) * (n + 1))
        cdef double * dj = <double *>PyMem_Malloc(count * sizeof(double) * (n + 1))
        cdef int * mn    = <int *>PyMem_Malloc(count * sizeof(int))
        cdef double[::1] res = np.empty(count)
        cdef int i
        if count < 100:
            for i in range(x.shape[0]):
                _sphj(&n, &x[i], mn + i, sj + i*(n+1), dj + i*(n+1))
                res[i] = sj[i*(n+1) + n]    #choose the element you want here        
        else:
            for i in prange(count,  nogil=True):
                _sphj(&n, &x[i], mn + i, sj + i*(n+1), dj + i*(n+1))
                res[i] = sj[i*(n+1) + n]    #choose the element you want here
        PyMem_Free(sj)
        PyMem_Free(dj)
        PyMem_Free(mn)
        return res.base
    

    To compare, here is the Python function that call sphj() in a forloop:

    import numpy as np
    def python_sphj(n, x):
        sphj = special.specfun.sphj
        res = np.array([sphj(n, v)[1][n] for v in x])
        return res
    

    Here is the %timit results for 10 elements:

    x = np.linspace(1, 2, 10)
    r1 = cython_sphj2(4, x)
    r2 = python_sphj(4, x)
    assert np.allclose(r1, r2)
    %timeit cython_sphj2(4, x)
    %timeit python_sphj(4, x)
    

    the result:

    10000 loops, best of 3: 21.5 µs per loop
    10000 loops, best of 3: 28.1 µs per loop
    

    Here is the results for 100000 elements:

    x = np.linspace(1, 2, 100000)
    r1 = cython_sphj2(4, x)
    r2 = python_sphj(4, x)
    assert np.allclose(r1, r2)
    %timeit cython_sphj2(4, x)
    %timeit python_sphj(4, x)
    

    the result:

    10 loops, best of 3: 44.7 ms per loop
    1 loops, best of 3: 231 ms per loop