pythonnumpycythonmemoryview

Cython returned memoryview is always considered uninitialized


Similar to Cython Memoryview as return value, but I didn't see a solution other than hacking the generated C code.

I'm using Cython 3.0, but it looks like the result is the same with <3.

Here's an example:

# cython: language_level=3, boundscheck=False, cdivision=True, wraparound=False, initializedcheck=False, nonecheck=False
cimport cython
from cython cimport floating
import numpy as np
cimport numpy as np

np.import_array()

def test_func():
    cdef np.ndarray[float, ndim=2] arr = np.zeros((5, 5), dtype=np.float32)
    cdef float[:, ::1] arr_view = arr
    _run(arr_view)

cdef void _run(floating[:, ::1] arr_view) noexcept nogil:
    cdef floating[:, :] tmp = _get_upper_left_corner(arr_view)

cdef inline floating[:, :] _get_upper_left_corner(floating[:, ::1] arr) noexcept nogil:
    return arr[:-1, :-1]

Then run cython -a cython_test.pyx and it shows that the _get_upper_left_corner function has memoryview initialization code including a GIL acquire and the _run function has error checking because the _get_upper_left_corner function could return an error (at least that's my guess):

+17: cdef inline floating[:, :] _get_upper_left_corner(floating[:, ::1] arr) noexcept nogil:

static CYTHON_INLINE __Pyx_memviewslice __pyx_fuse_0__pyx_f_11cython_test__get_upper_left_corner(__Pyx_memviewslice __pyx_v_arr) {
  __Pyx_memviewslice __pyx_r = { 0, 0, { 0 }, { 0 }, { 0 } };
/* … */
  /* function exit code */
  __pyx_L1_error:;
  #ifdef WITH_THREAD
  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __PYX_XCLEAR_MEMVIEW(&__pyx_t_1, 1);
  __pyx_r.data = NULL;
  __pyx_r.memview = NULL;
  __Pyx_AddTraceback("cython_test._get_upper_left_corner", __pyx_clineno, __pyx_lineno, __pyx_filename);
  goto __pyx_L2;
  __pyx_L0:;
  if (unlikely(!__pyx_r.memview)) {
    #ifdef WITH_THREAD
    PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
    #endif
    PyErr_SetString(PyExc_TypeError, "Memoryview return value is not initialized");
    #ifdef WITH_THREAD
    __Pyx_PyGILState_Release(__pyx_gilstate_save);
    #endif
  }
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
  __pyx_L2:;
  return __pyx_r;
}

I would have assumed the slicing on the memoryview would have created a new struct. I can live with the struct being initialized if it has to be, but I really don't want the GIL to be acquired. Is there any way of accomplishing a returned memoryview without the GIL? If I need to initialize something, how can I do that without copying the potentially large numpy array (I'm only reading it).

Edit: I made the example even smaller:

# cython: language_level=3, boundscheck=False, cdivision=True, wraparound=False, initializedcheck=False, nonecheck=False

cdef float[:] get_upper_left_corner(float[:] arr) noexcept nogil:
    return arr[:2]

Edit 2: I noticed that in my original code case, the caller (_run in this case) always includes a GIL acquire at the end even in the success case:

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  #ifdef WITH_THREAD
  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __PYX_XCLEAR_MEMVIEW(&__pyx_t_1, 1);
  __Pyx_WriteUnraisable("cython_test._run", __pyx_clineno, __pyx_lineno, __pyx_filename, 1, 0);
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
  __pyx_L0:;
  #ifdef WITH_THREAD
  __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __PYX_XCLEAR_MEMVIEW(&__pyx_v_tmp, 1);
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif

Solution

  • I can live with the struct being initialized if it has to be, but I really don't want the GIL to be acquired

    The GIL is only acquired on the error path (i.e. if you attempt to take an invalid slice). If you look at the generated C code you'll see that all the times the GIL is acquired are guarded by:

    if (unlikely(!__pyx_r.memview))
    

    or just after the label

      __pyx_L1_error:;
    

    Successfully slicing a memoryview does not use the GIL.

    Slicing a memoryview has one bit of reference counting. This is atomic, so doesn't require the GIL, but it isn't ultra-cheap. So it's probably still a good idea not to put this in your inner loop.


    Some details that probably should have gone in the original answer:

    The bits of memoryview access that are really optimized is access to individual array elements rather than slicing. The actual slicing is done with a utility code function __pyx_memoryview_slice_memviewslice which doesn't actually depend on the what Cython directives you've set.

    It doesn't look like the C code is actually any different with/without the boundscheck/wraparound directives set. Some of that makes sense when you think about it - the main purpose of boundscheck is to avoid looking up the memoryview size, but here you can't avoid it since it needs to set the size of the result memoryview.


    About the _run:

      #ifdef WITH_THREAD
      __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
      #endif
      __PYX_XCLEAR_MEMVIEW(&__pyx_v_tmp, 1);
      #ifdef WITH_THREAD
      __Pyx_PyGILState_Release(__pyx_gilstate_save);
      #endif
    

    That could be improved I think since it probably is unnecessary. Note that __PYX_XCLEAR_MEMVIEW may acquire the GIL itself, but only when deallocating the final reference to the memoryview internals. The second argument (the 1) tells it whether it needs to do that or whether it definitely has the GIL. So it isn't 100% GIL-free (although in the case of your _run function it won't be the final reference).

    The reference counting for memoryviews has been slightly rearranged for Cython 3.0 (mainly to avoid a few corner cases where references could be leaked) so it's not completely surprising that it's changed from 0.29.x.

    Edit: Resolved GitHub issue: https://github.com/cython/cython/issues/5670