Unix-based system. I'm trying to use as little overhead as possible right now in the code I'm working on (it's in a resource constrained space). In this particular code, we are gathering some basic disk usage stats. One suggestion was to replace a call to df
with statfs
since df
is a C utility that requires its own subprocess to run whereas statfs
is a system call which presumably uses less overhead (and is what df
calls anyway).
We're calling df
with Python's subprocess.check_output()
command:
import subprocess
DF_CMD = ["df", "-P", "-k"]
def get_disk_usage() -> str:
try:
output = subprocess.check_output(DF_CMD, text=True)
except subprocess.CalledProcessError as e:
raise RuntimeError(f"Failed to execute {DF_CMD} " + str(e)) from e
return output
I want to hard code our mount points (which we decided we're okay with) and replace the call to df
with a call to statfs <mountpoint>
in the above code. However, I'm unsure if calling with the same Python function will actually reduce overhead. I plan to use a profiler to check it, but I'm curious if anyone knows enough about the inner workings of Python/Unix to know what's going on under the hood?
And to be clear: by "overhead" I mean CPU and memory usage on the OS/machine.
However, I'm unsure if calling with the same Python function will actually reduce overhead
Spawning a new process - fork
and execve
- are generally extremely costly syscalls. They are the reason why the shell is so slow - almost every functionality in the shell is a separate process, and the shell also spawns subshells in many contexts. Nowadays, computers are anyway magnitudes extremely faster, the cost of spawning a new process is negligible. There are thousands of processes on nowadays computers.
Yes, replacing subprocess
with os.statvfs
will reduce the overhead. Unless you are working on a really really resource constrained device, like, I don't know, 64MB of memory, it is usually not worth the time, but it is very nice to do to make the code self-contained and clean and reduce the amount of possible errors. Python is "very" memory consuming anyway, so the act of running it already implicates to me that you probably have more than enough resources to spawn a single subprocess.