pythonmultithreadingparallel-processingpython-stackless

Python Handling large number of Threads?


# data is a list  

Threading_list=[]

class myfunction(threading.Thread):

    def __init__(self,val):
        .......
    .......

     def run(self):
        .......
        ....... 

for i in range(100000):

    t=myfunction(data[i]) # need to execute this function on every datapoint 
    t.start()
    Threading_list.append(t)

for t in Threading_list:
    t.join()

This will create around 100000 threads, but i am allowed to create a maximum of 32 threads ? What modifications can be done in this code ?


Solution

  • So many Python threads rarely need to be created. Even more, I hardly can imagine a reason for that. There are suitable architectirual patterns to solve the tasks of creating code executing in parallel that limit the number of threads. One of them is reactor.

    What are you trying to do?

    And remeber that, due to GIL, Python threads do not give any performance boost for computational tasks, even on multiprocessor and multiple kernel systems (BTW, can there be a 100000-kernel system? I doubt. :)). The only chance for boost is if the computational part is performed inside modules written in C/C++ that do their work without acquiring GIL. Usually Python threads are used to parallel the execution of code that contains blocking I/O operations.

    UPD: Noticed the stackless-python tag. AFAIK, it supports microthreads. However, it's still unclear what are you trying to do.

    And if you are trying just to process 100000 values (apply a formula to each of them?), it's better to write something like:

    def myfunction(val):
        ....
        return something_calculated_from_val
    
    results = [myfunction(d) for d in data] # you may use "map(myfunction, data)" instead
    

    It should be much better, unless myfunction() performs some blocking I/O. If it does, ThreadPoolExecutor may really help.