pythonc++cythoncythonize

How/When to use a Cython Extension Type vs a Cython Struct to store data that is passed around class functions


I am interested in creating a data structure to hold data/information that is passed around to different functions. The way we currently do it is the following code:

# right now we use a C-struct to hold data that is passed
# around in a separate class 'A'
cdef struct Record:
    double threshold       
    double improvement     

cdef class A:

   cpdef py_dostuff(self):
         cdef Record record
         
         record.threshold = new_threshold
         record.improvement = new_improvement
         
         cy_dostuff(&record)
   cdef void cy_dostuff(self, Record record) nogil:
         do_some_computation(record.threshold, record.improvement)

This uses a C-style struct which unfortunately doesn’t support inheritance so if we wanted to subclass “A” with another class “B” that uses a “subclass” of the struct, it does not work. My attempt at using a class does not work. Ideally, I would be able to do something like the following without sacrificing performance. My thinking is that it should be possible to replace the struct with a purely Cython extension type because I'm only using C-level stuff, but the extension type would enable me to subclass Record and A.

# now, I would like to use a Cython extension type to hold data that is passed
# around in a separate class 'A'
cdef class Record:
    cdef double threshold       
    cdef double improvement     

cdef class A:

   cpdef py_dostuff(self):
         cdef Record record
         
         record.threshold = new_threshold
         record.improvement = new_improvement
         
         cy_dostuff(&record)

   cdef void cy_dostuff(self, Record record) nogil:
         do_some_computation(record.threshold, record.improvement)

# The reason I would like to use a Cython extension type is that it can then support clean inheritance of the data structure
cdef class NewRecord(Record):
    cdef double threshold       
    cdef double improvement     
    cdef int new_attribute

# E.g. a new subclass of 'A' would still work even if all we did was extend the logic to a "NewRecord"
cdef class B(A):
    cpdef py_dostuff(self):
         cdef NewRecord record
         
         record.threshold = new_threshold
         record.improvement = new_improvement
         record.new_attribute = new_attribute

         cy_dostuff(&record)

     cdef void cy_dostuff(self, Record record) nogil:
         do_some_computation(record.threshold, record.improvement, record.new_attribute)

My questions are:

  1. How can I substitute a pure-Cython class (with no Python objects to allow nogil operations) in place of the struct correctly?
  2. Would there be performance differences?
  3. If I cannot, why not and what are workarounds to passing struct like data structures around?

Solution

  • How can I substitute a pure-Cython class (with no Python objects to allow nogil operations) in place of the struct correctly?

    There's nothing particularly tricky about using a Cython cdef class in place of the struct - you can pass them to nogil functions and access their non-object cdef attributes without requiring the GIL:

    cdef class A:
        cdef double threshold
        cdef double improvement
    
        def example_func(self):
            with nogil:
                self.threshold = do_something(self)
    
    cdef double do_something(A a) nogil:
        return a.threshold
    

    Please consider whether you actually need to work without the GIL (i.e. you're doing multi-threading). A lot of people thing "nogil==fast" and ask for nogil solutions for largely cargo-cult reasons.

    Note that you cannot take an address of a cdef class. cy_dostuff(&record) would become cy_dostuff(record) in your (non-working) example.

    Would there be performance differences?

    Probably not much. A cdef class is essentially a struct. Internally by pointer (rather than by value) and allocated on the heap so that might make a small difference. Cython takes care of the details for you though.

    If I cannot, why not and what are workarounds to passing struct like data structures around?

    You can do "inheritance by composition":

    cdef struct Record:
        double threshold       
        double improvement
    
    cdef struct NewRecord:
        Record base
        int new_attribute
    

    The C standard makes it explicitly allowed to cast between Record and NewRecord pointers to support this exact use. So if you have a function that takes a Record pointer you can do f(<Record*>&my_new_record)