pythonpython-multithreadingnetmiko

Understanding the speed difference in threading


This is the script both threading functions call:

def searchBadWireless( hub ):

    host = f'xxx.xxx.xxx.{hub}'
    results = {}
    try:
        netConnect = ConnectHandler( device_type=platform, ip=host, username=cisco_username, password=cisco_password, )

       output = netConnect.send_command( 'sh int status | i 298|299' )
       netConnect.disconnect()

       results[ int( hub ) ] = output
   except:
       print( f'{host} - Failed to connect' )

Now the first threading function I have completes in around 7 seconds:

def threadingProcess( execFunction ):

    switchList = getSwitchIPs()

    start = perf_counter()
    threads = []
    for ip in switchList:
        thread = threading.Thread(target=execFunction, args=( [ip[ 0 ]] ) )
        threads.append( thread )

    for t in threads:
        t.start()

    for c in threads:
        c.join()

    finish = perf_counter()
    print(f"It took {finish-start} second(s) to finish.")

But the second one I have runs at around 32 seconds:

def newThreadProcess():

    switchList = getSwitchIPs()

    start = perf_counter()

    with ThreadPoolExecutor() as executor:
        results = executor.map(searchBadWireless, switchList)
        # for result in results:
            # print(result)

    finish = perf_counter()

    print(f"It took {finish-start} second(s) to finish.")

From what I have read online the better approach is the second function but why does it take so much longer to complete than the first, is there a way of speeding it up to be as fast as the first function?


Solution

  • The first function is faster for the simple reason that all threads are started immediately. If your work items are of number N, you are lunching N threads in parallel. If your machine can handle that load, it will be fast. For the second function, the ThreadPoolExecutor, by default, limits the number of threads by using a pool of threads. In order to specify the pool size, you need to set the max_workers arguments to the target number of threads.

    Doc: Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.

    So it seems that the host had a low number of CPUs, thus limiting the number of threads in the pool. Theoretically, if the number of max_workers was equal to N (number of work items), the throughput of both functions would be the same.