I have a project where I am mixing C++ and Python code.
For multiple reasons, the frontend needs to be in Python and the backend in C++.
Now, I am looking for a solution as for how to pass my Python object to C++. One thing to note is the fact that the backend needs to call back into Python at some point for calculating some numbers, where the Python function will return a list of floats.
I have been looking at pybind11 type conversion options defined here: https://pybind11.readthedocs.io/en/stable/advanced/cast/index.html
However, to me it seems like option 1 is kind of easy to use as I can see here: https://pybind11.readthedocs.io/en/stable/advanced/classes.html#overriding-virtual-functions-in-python
So I am wondering, why would someone choose number 3? How does it compare with option 1?
Yes, if the main code is in C++ and the bindings are well fleshed out, then option 1 is the easiest to work with, as in that case the bound C++ objects are as natural to use in Python as native Python classes. It makes life easier because you get full control over object identity and whether or not to copy.
For 3, I'm finding pybind11 to be too aggressive with copying when using callbacks (as seems to be your use case), e.g. with numpy arrays it's perfectly possible to work with the buffer on the C++ side if it is verified to be contiguous. Sure, copying will safeguard against memory problems, but there's too little control given over copying v.s. non-copying (numpy has the same problem tbs).
The reason why 3 exists is mostly because it improves usability and provides nice syntax. For example, if we have a function with this signature:
void func(const std::vector<int>&)
then it is nice to be able to call it from the Python side as func((1, 2, 3))
or even func(range(3))
. It's convenient, easy to use, looks clean, etc. But at that point, there is no way out but to copy, since the memory layout of a tuple
is so different from a std::vector
(and the range does not even represent an in-memory container).
Note carefully however, that with the func
example above, the caller could still decide to provide a bound std::vector<int>
object, and thus pre-empt any copying. May not look as nice, but there is full control. This is useful, for example if the vector is a return value from some other function, or is modified in between calls:
v = some_calc() # with v a bound C++ vector
func(v)
v.append(4) # add an element
func(v)
Contrast this to the case where a list of floats is returned after calculating some numbers, analog to (but not quite) your description:
std::list<float> calc()
If you choose "option 1", then the bound function calc
will return a bound C++ object of std::list<float>
. If you choose "option 3", then the bound function calc
will return a Python list
with the contents of the C++ std::list<float>
copied into it.
The problem that arises with "option 3" is that if the caller actually wanted a bound C++ object, then the values need to be copied back into a new list, so a total of 2 copies. OTOH, if you choose "option 1" and the caller wanted instead a Python list
, then they are free to do the copy on the return value of calc
if desired:
res = calc()
list_res = list(res)
or even, if they want this all the time:
def pycalc():
return list(calc())
Now finally to your specific case where it is a Python callback, called from C++, that returns a list of floats. If you use "option 1", then the Python function is forced to create a C++ list to return, so for example (with type cpplist
the name given to a bound type std::list<float>
):
def pycalc():
return cpplist(range(3))
which a Python programmer would not find pretty. Instead, by choosing "option 3", checking the return type and doing a conversion if needed, this would be valid as well:
def pycalc():
return [x for x in range(3)]
Depending on the overall requirements and typical use cases then, "option 3" may be more appreciated by your users.