pythonc++data-conversionpybind11python-bindings

What is the difference between different pybind11's type conversion options?


I have a project where I am mixing C++ and Python code.

For multiple reasons, the frontend needs to be in Python and the backend in C++.

Now, I am looking for a solution as for how to pass my Python object to C++. One thing to note is the fact that the backend needs to call back into Python at some point for calculating some numbers, where the Python function will return a list of floats.

I have been looking at pybind11 type conversion options defined here: https://pybind11.readthedocs.io/en/stable/advanced/cast/index.html

However, to me it seems like option 1 is kind of easy to use as I can see here: https://pybind11.readthedocs.io/en/stable/advanced/classes.html#overriding-virtual-functions-in-python

So I am wondering, why would someone choose number 3? How does it compare with option 1?


Solution

  • Yes, if the main code is in C++ and the bindings are well fleshed out, then option 1 is the easiest to work with, as in that case the bound C++ objects are as natural to use in Python as native Python classes. It makes life easier because you get full control over object identity and whether or not to copy.

    For 3, I'm finding pybind11 to be too aggressive with copying when using callbacks (as seems to be your use case), e.g. with numpy arrays it's perfectly possible to work with the buffer on the C++ side if it is verified to be contiguous. Sure, copying will safeguard against memory problems, but there's too little control given over copying v.s. non-copying (numpy has the same problem tbs).

    The reason why 3 exists is mostly because it improves usability and provides nice syntax. For example, if we have a function with this signature:

    void func(const std::vector<int>&)
    

    then it is nice to be able to call it from the Python side as func((1, 2, 3)) or even func(range(3)). It's convenient, easy to use, looks clean, etc. But at that point, there is no way out but to copy, since the memory layout of a tuple is so different from a std::vector (and the range does not even represent an in-memory container).

    Note carefully however, that with the func example above, the caller could still decide to provide a bound std::vector<int> object, and thus pre-empt any copying. May not look as nice, but there is full control. This is useful, for example if the vector is a return value from some other function, or is modified in between calls:

    v = some_calc()   # with v a bound C++ vector
    func(v)
    v.append(4)       # add an element
    func(v)
    

    Contrast this to the case where a list of floats is returned after calculating some numbers, analog to (but not quite) your description:

    std::list<float> calc()
    

    If you choose "option 1", then the bound function calc will return a bound C++ object of std::list<float>. If you choose "option 3", then the bound function calc will return a Python list with the contents of the C++ std::list<float> copied into it.

    The problem that arises with "option 3" is that if the caller actually wanted a bound C++ object, then the values need to be copied back into a new list, so a total of 2 copies. OTOH, if you choose "option 1" and the caller wanted instead a Python list, then they are free to do the copy on the return value of calc if desired:

    res = calc()
    list_res = list(res)
    

    or even, if they want this all the time:

    def pycalc():
        return list(calc())
    

    Now finally to your specific case where it is a Python callback, called from C++, that returns a list of floats. If you use "option 1", then the Python function is forced to create a C++ list to return, so for example (with type cpplist the name given to a bound type std::list<float>):

    def pycalc():
        return cpplist(range(3))
    

    which a Python programmer would not find pretty. Instead, by choosing "option 3", checking the return type and doing a conversion if needed, this would be valid as well:

    def pycalc():
        return [x for x in range(3)]
    

    Depending on the overall requirements and typical use cases then, "option 3" may be more appreciated by your users.