pythonccython

Most efficient way to build a 1-d array/list/vector of unknown length using Cython? Or should this never be done?


I have a time-critical model that I wrote in Cython. The main function of my Cython extension has one loop, and according to the Cython profiler (where it shows the amount of Python calls in shades of yellow) the only 'yellow' part is currently where I'm appending to a Python list. (I have to output a Python object as I'm calling my Cython function in a Python script). This is the basic idea of my function (the rest is superfluous, I've tested every part of this function and the append operation is the bottleneck):

from libc.math cimport log
def main(some args):
    cdef (some vars)

    cdef list OutputList = []

    # NB: all vars have declared types
    for x in range(t):
        (do some Cythonic stuff, some of which uses my cimport-ed log)
        if condition is True:
            OutputList.append(x) # this is the only 'yellow' line in my main loop.
    return OutputList # return Python object to Python script that calls main()

Unfortunately, I don't know the length of my output array/list/vector (whatever I end up using). However, I could set it to 52560, which is what I end up resizing it to down the line in some other Python code. I'd like to get a major speed boost without setting the output array's length, but I will gladly toss that hope if it's holding me back.

I've also tried going with C++ in Cython to use C++ data structures (vector, queue, etc.) but doing so removes my ability to nicely cimport log. I see on the Cython documentation/wiki that you can write a 'shim' module to use pure-C functions in C++ Cython, but I have no idea how to do this and I can't find anything about how to go about that.

Anyway, I welcome all suggestions that adhere to my question:

What is the best way to build a list/array/vector of unknown size in Cython? Or is there a clear alternative (such as settling with a known-length iterable object) that makes moot my unknown-length problem?

Update

The C++ containers did show a speed increase over item assignment, and item assignment did show a speed increase over appending to lists and numpy arrays. The best method would be to use C++ containers while also being able to cimport pure-C functions...this would prevent the slow-down from having to look beyond libc.math for a quick log function.


Solution

  • Appending python lists is a well optimized operation in CPython. Python does not allocate memory for each element, but incrementally growing arrays of Pointers to the objects in the list. So just switching to Cython will not help you very much here.

    You can use c++ containers within Cython as follows:

    from libc.math cimport log
    from libcpp.list cimport list as cpplist
    
    def main(int t):
    
        cdef cpplist[int] temp
    
        for x in range(t):
            if x> 0:
                temp.push_back(x)
    
        cdef int N = temp.size()
        cdef list OutputList = N*[0]
    
        for i in range(N):
            OutputList[i] = temp.front()
            temp.pop_front()
    
        return OutputList  
    

    You have to test if this speeds things up, but maybe you will not gain to much speed.

    Another way is to use numpy arrays. Here Cython is very good in optimizing code. So if you can live with an numpy array as return value of main, you should consider that, and replace the construction and filling of OutputList by some Cython code allocating and filling a numpy array.

    For more information see http://docs.cython.org/src/tutorial/numpy.html

    Ask if you need help.

    UPDATE: the code should be a bit faster if you avoid method lookup in both loops:

    from libc.math cimport log
    from libcpp.list cimport list as cpplist
    
    def main(int t):
    
        cdef cpplist[int] temp
    
        push_back = temp.push_back
        for x in range(t):
            if x> 0:
                push_back(x)
    
        cdef int N = temp.size()
        cdef list OutputList = N*[0]
    
        front = temp.front()
        pop_front = temp.pop_front()
        for i in range(N):
            OutputList[i] = front()
            pop_front()
    
        return OutputList