pythonparallel-processingmultiprocessingpython-multiprocessingprocess-pool

os.sched_getaffinity(0) vs os.cpu_count()


So, I know the difference between the two methods in the title, but not the practical implications.

From what I understand: If you use more NUM_WORKERS than are cores actually available, you face big performance drops because your OS constantly switches back and forth trying to keep things in parallel. Don't know how true this is, but I read it here on SO somewhere from someone smarter than me.

And in the docs for os.cpu_count() it says:

Return the number of CPUs in the system. Returns None if undetermined. This number is not equivalent to the number of CPUs the current process can use. The number of usable CPUs can be obtained with len(os.sched_getaffinity(0))

So, I'm trying to work out what the "system" refers to if there can be more CPUs usable by a process than there are in the "system".

I just want to safely and efficiently implement multiprocessing.pool functionality. So here is my question summarized:

What are the practical implications of:

NUM_WORKERS = os.cpu_count() - 1
# vs.
NUM_WORKERS = len(os.sched_getaffinity(0)) - 1

The -1 is because I've found that my system is a lot less laggy if I try to work while data is being processed.


Solution

  • If you had a tasks that were pure 100% CPU bound, i.e. did nothing but calculations, then clearly nothing would/could be gained by having a process pool size greater than the number of CPUs available on your computer. But what if there was a mix of I/O thrown in whereby a process would relinquish the CPU waiting for an I/O to complete (or, for example, a URL to be returned from a website, which takes a relatively long time)? To me it's not clear that you couldn't achieve in this scenario improved throughput with a process pool size that exceeds os.cpu_count().

    Update

    Here is code to demonstrate the point. This code, which would probably be best served by using threading, is using processes. I have 8 cores on my desktop. The program simply retrieves 54 URL's concurrently (or in parallel in this case). The program is passed an argument, the size of the pool to use. Unfortunately, there is initial overhead just to create additional processes so the savings begin to fall off if you create too many processes. But if the task were long running and had a lot of I/O, then the overhead of creating the processes would be worth it in the end:

    from concurrent.futures import ProcessPoolExecutor, as_completed
    import requests
    from timing import time_it
    
    def get_url(url):
        resp = requests.get(url, headers={'user-agent': 'my-app/0.0.1'})
        return resp.text
    
    
    @time_it
    def main(poolsize):
        urls = [
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
            'https://ibm.com',
            'https://microsoft.com',
            'https://google.com',
        ]
        with ProcessPoolExecutor(poolsize) as executor:
            futures = {executor.submit(get_url, url): url for url in urls}
            for future in as_completed(futures):
                text = future.result()
                url = futures[future]
                print(url, text[0:80])
                print('-' * 100)
    
    if __name__ == '__main__':
        import sys
        main(int(sys.argv[1]))
    

    8 processes: (the number of cores I have):

    func: main args: [(8,), {}] took: 2.316840410232544 sec.
    

    16 processes:

    func: main args: [(16,), {}] took: 1.7964842319488525 sec.
    

    24 processes:

    func: main args: [(24,), {}] took: 2.2560818195343018 sec.