pythonparallel-processingscrapyipython-parallel

Make 2 functions run at the same time and in parallel?


I have an array

myArray = array(url1,url2,...,url90)

I want to execute this commande 3 times in parallel

scrapy crawl mySpider -a links=url

and each time with 1 url,

scrapy crawl mySpider -a links=url1
scrapy crawl mySpider -a links=url2
scrapy crawl mySpider -a links=url3

and when the first one finish his job, he will get the other url like

scrapy crawl mySpider -a links=url4

I read this question, and this one and I try this:

import threading
from threading import Thread

def func1(url):

    scrapy crawl mySpider links=url

if __name__ == '__main__':
    myArray = array(url1,url2,...,url90)
    for(url in myArray):
        Thread(target = func1(url)).start()

Solution

  • When you write target = func1(url) you actually runnig func1 and passing result to Thread (not a reference do the function). This means functions are run on the loop not in the seperate thread.

    You need to rewrite it like that:

    if __name__ == '__main__':
        myArray = array(url1,url2,...,url90)
        for(url in myArray):
            Thread(target=func1, args=(url,))).start()
    

    Then you are telling Thread to run func1 with arguments (url,)

    Also you should wait for Threads to finish after the loop, otherwise your program with terminate just after starting all the threads.

    EDIT: and if you want only 3 threads to be run on the same time you may want to use ThreadPool:

    if __name__ == '__main__':
        from multiprocessing.pool import ThreadPool
    
        pool = ThreadPool(processes=3)
        pool.map(func, myArray)