I created Python Bindings using pybind11
. Everything worked perfectly, but when I did a speed check test
the result was disappointing.
Basically, I have a function in C++ that adds two numbers and I want to use that function from a Python script. I also included a for loop to ran 100 times to better view the difference in the processing time.
For the function "imported" from C++, using pybind11, I obtain: 0.002310514450073242 ~ 0.0034799575805664062
For the simple Python script, I obtain: 0.0012788772583007812 ~ 0.0015883445739746094
main.cpp file:
#include <pybind11/pybind11.h>
namespace py = pybind11;
double sum(double a, double b) {
return a + b;
}
PYBIND11_MODULE(SumFunction, var) {
var.doc() = "pybind11 example module";
var.def("sum", &sum, "This function adds two input numbers");
}
main.py file:
from build.SumFunction import *
import time
start = time.time()
for i in range(100):
print(sum(2.3,5.2))
end = time.time()
print(end - start)
CMakeLists.txt file:
cmake_minimum_required(VERSION 3.0.0)
project(Projectpybind11 VERSION 0.1.0)
include(CTest)
enable_testing()
add_subdirectory(pybind11)
pybind11_add_module(SumFunction main.cpp)
set(CPACK_PROJECT_NAME ${PROJECT_NAME})
set(CPACK_PROJECT_VERSION ${PROJECT_VERSION})
include(CPack)
Simple Python script:
import time
def summ(a,b):
return a+b
start = time.time()
for i in range(100):
print(summ(2.3,5.2))
end = time.time()
print(end - start)
Benchmarking is a very complicated thing, even can be called as a Systemic Engineering.
Because there are many processes will interference our benchmarking job. For
example: NIC interrupt responsing / keyboard or mouse input / OS scheduling...
I have encountered my producing process being blocked by OS for up to 15 seconds!
So as the other advisors have pointed out, the print()
invokes more
unnecessary interference.
Your testing computation is too simple.
You must think it out clearly what are you comparing for. The speed of passing arguments between Python and C++ is obviously slower than that of within Python side. So I assume that you want to compare the computing speed of both, instead of arguments passing speed. If so, I think your computing codes are too simple, and these will lead to the time we counted is mainly the time for passing args, while the time for computing is merely the minor of the total. So, I put out my sample below, I will be glad to see anyone polish it.
Your loop count is too less.
The less loops, the more randomness. Similar with my opinion 1, testing time is merely 0.000x second. It is possible, that the running process be interferenced by OS. I think we should make the testing time to last at least a few of seconds.
C++ is not always faster than Python. Now time there are so many Python modules/libs can use GPU to execute heavy computation, and parallelly do matrix operations even only by using CPU. I guess that perhaps you are evaluating whether or not using Pybind11 in your project. I think that comparing like this worth nothing, because what is the best tool depends on what is the real requirement, but it is a good lesson to learn things. I recently encountered a case, Python is faster than C++ in a Deep Learning. Haha, funny?
At the end, I run my sample in my PC, and found that the C++ computing speed is faster up to 100 times than that in Python.
ComplexCpp.cpp:
#include <cmath>
#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
namespace py = pybind11;
double Compute( double x, py::array_t<double> ys ) {
// std::cout << "x:" << std::setprecision( 16 ) << x << std::endl;
auto r = ys.unchecked<1>();
for( py::ssize_t i = 0; i < r.shape( 0 ); ++i ) {
double y = r( i );
// std::cout << "y:" << std::setprecision( 16 ) << y << std::endl;
x += y;
x *= y;
y = std::max( y, 1.001 );
x /= y;
x *= std::log( y );
}
return x;
};
PYBIND11_MODULE( ComplexCpp, m ) {
m.def( "Compute", &Compute, "a more complicated computing" );
};
tryComplexCpp.py
import ComplexCpp
import math
import numpy as np
import random
import time
def PyCompute(x: float, ys: np.ndarray) -> float:
#print(f'x:{x}')
for y in ys:
#print(f'y:{y}')
x += y
x *= y
y = max(y, 1.001)
x /= y
x *= math.log(y)
return x
LOOPS: int = 100000000
if __name__ == "__main__":
# initialize random
x0 = random.random()
""" We store all args in a array, then pass them into both C++ func and
python side, to ensure that args for both sides are same. """
args = np.ndarray(LOOPS, dtype=np.float64)
for i in range(LOOPS):
args[i] = random.random()
print('Args are ready, now start...')
# try it with C++
start_time = time.time()
x = ComplexCpp.Compute(x0, args)
print(f'Computing with C++ in { time.time() - start_time }.\n')
# forcely use the result to prevent the entire procedure be optimized(omit)
print(f'The result is {x}\n')
# try it with python
start_time = time.time()
x = PyCompute(x0, args)
print(f'Computing with Python in { time.time() - start_time }.\n')
# forcely use the result to prevent the entire procedure be optimized(omit)
print(f'The result is {x}\n')