[SOLVED] "writing a python binding" vs "using command-line directly"

"writing a python binding" vs "using command-line directly"

I have a question regarding python bindings.

I have a command-line which exposes some functionality and code is re-factored to provide the functionality through a shared library. I wanted to know what the real advantage that I get from "writing a python binding for the shared library" vs "calling the command line directly".

One obvious advantage I think will be performance, the shared library will link to the same process and the functionality can called within the same process. It will avoid spawning a new process through the command line.

Any other advantages I can get from writing a python binding for such a case ?

Thanks.

Solution

I can hardly imagine a case where one would prefer wrapping a library's command line interface over wrapping the library itself. (Unless there is a library that comes with a neat command line interface while being a total mess internally; but the OP indicates that the same functionality available via the command line is easily accessible in terms of library function calls).

The biggest advantage of writing a Python binding is a clearly defined data interface between the library and Python. Ideally, the library can operate directly on memory managed by Python, without any data copying involved.

To illustrate this, let's assume a library function does something more complicated than printing the current time, i.e., it obtains a significant amount of data as an input, performs some operation, and returns a significant amount of data as an output. If the input data is expected as an input file, Python would need to generate this file first. It must make sure that the OS has finished writing the file before calling the library via the command line (I have seen several C libraries where sleep(1) calls were used as a band-aid for this issue...). And Python must get the output back in some way.

If the command line interface does not rely on files but obtains all arguments on the command line and prints the output on stdout, Python probably needs to convert between binary data and string format, not always with the expected results. It also needs to pipe stdout back and parse it. Not a problem, but getting all this right is a lot of work.

What about error handling? Well, the command line interface will probably handle errors by printing error messages on stderr. So Python needs to capture, parse and process these as well. OTOH, the corresponding library function will almost certainly make a success flag accessible to the calling program. This is much more directly usable for Python.

All of this is obviously affecting performance, which you already mentioned.

As another point, if you are developing the library yourself, you will probably find after some time that the Python workflow has made the whole command line interface obsolete, so you can drop supporting it altogether and save yourself a lot of time.

So I think there is a clear case to be made for the Python bindings. To me, one of the biggest strengths of Python is the ease with which such wrappers can be created and maintained. Unfortunately, there are about 7 or 8 equally easy ways to do this. To get started, I recommend ctypes, since it does not require a compiler and will work with PyPy. For best performance use the native C-Python API, which I also found very easy to learn.