Understanding the speed difference in threading

This is the script both threading functions call:

def searchBadWireless( hub ):

    host = f'xxx.xxx.xxx.{hub}'
    results = {}
    try:
        netConnect = ConnectHandler( device_type=platform, ip=host, username=cisco_username, password=cisco_password, )

       output = netConnect.send_command( 'sh int status | i 298|299' )
       netConnect.disconnect()

       results[ int( hub ) ] = output
   except:
       print( f'{host} - Failed to connect' )

Now the first threading function I have completes in around 7 seconds:

def threadingProcess( execFunction ):

    switchList = getSwitchIPs()

    start = perf_counter()
    threads = []
    for ip in switchList:
        thread = threading.Thread(target=execFunction, args=( [ip[ 0 ]] ) )
        threads.append( thread )

    for t in threads:
        t.start()

    for c in threads:
        c.join()

    finish = perf_counter()
    print(f"It took {finish-start} second(s) to finish.")

But the second one I have runs at around 32 seconds:

def newThreadProcess():

    switchList = getSwitchIPs()

    start = perf_counter()

    with ThreadPoolExecutor() as executor:
        results = executor.map(searchBadWireless, switchList)
        # for result in results:
            # print(result)

    finish = perf_counter()

    print(f"It took {finish-start} second(s) to finish.")

From what I have read online the better approach is the second function but why does it take so much longer to complete than the first, is there a way of speeding it up to be as fast as the first function?

Solution

The first function is faster for the simple reason that all threads are started immediately. If your work items are of number N, you are lunching N threads in parallel. If your machine can handle that load, it will be fast. For the second function, the ThreadPoolExecutor, by default, limits the number of threads by using a pool of threads. In order to specify the pool size, you need to set the max_workers arguments to the target number of threads.

Doc: Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.

So it seems that the host had a low number of CPUs, thus limiting the number of threads in the pool. Theoretically, if the number of max_workers was equal to N (number of work items), the throughput of both functions would be the same.