pythonmultithreadingmultiprocessingos.systemsimultaneous

Run multiple .py from another .py simultaneously with arguments and timeout


I have a program, let's say "main.py", which runs through the argument "python main.py 3" or, for example, "python main.py 47", which means running a specific ID inside the program itself.

I'm trying to write another script, let's say "start.py", so that it starts a certain number of such programs. If inside start.py I have written threads = 4, timeout = 5, then it should run "python main.py 1", "python main.py 2", "python main.py 3", "python main.py 4" at the same time, but with a delay of 5 seconds between each command.

I know how to do this in one thread, but no other arguments are run until the previous one completes.

threads = 4
id = 1
for i in range(threads):
    os.system(f"python main.py {id}")
    id += 1
    time.sleep(5)

I am trying to do this via multiprocessing, but I am failing. What is the best way to implement this, and am I going in the right direction?

I've already done this through bash, but I only need to do it in Python.

for ((i=1; i<=4; i++))
do
    python3 main.py "$i" &
done

Solution

  • If you don't want to or can't make changes to main.py, then the simplest change you can make to your current code is to simply execute the system call in a thread so you do not block:

    from threading import Thread
    import os
    import time
    
    def run_main(id):
        os.system(f"python main.py {id}")
    
    threads = 4
    id = 1
    started_threads = []
    for i in range(threads):
        if i != 0:
            time.sleep(5)
        t = Thread(target=run_main, args=(id,))
        t.start()
        started_threads.append(t)
        id += 1
    for t in started_threads:
        t.join()
    

    Note that I have moved the call to time.sleep since you were doing an extra call that you did not need.

    But this is rather expensive in that you are starting a Python interpreter for each invocation of main. If I understand the comment offered by @BoarGules (although what he literally said would not run the function main 4 times in parallel but rather sequentially), the following is an alternative implementation if main.py is structured like the following:

    import sys
    
    def main(id):
        ... # process
    
    if __name__ == '__main__':
        main(sys.argv[1])
    

    And then your start.py, if running under Linux or some platform that uses fork to start new processes, is coded as follows:

    from multiprocessing import Process
    import os
    import time
    import main
    
    threads = 4
    id = 1
    started_processes = []
    for i in range(threads):
        if i != 0:
            time.sleep(5)
        p = Process(target=main.main, args=(id,))
        p.start()
        started_processes.append(p)
        id += 1
    for p in started_processes:
        p.join()
    

    But if you are running under Windows or some platform that uses spawn to start new processes, then you must code start.py as follows:

    from multiprocessing import Process
    import os
    import time
    import main
    
    # required for Windows:
    if __name__ == '__main__':
        threads = 4
        id = 1
        started_processes = []
        for i in range(threads):
            if i != 0:
                time.sleep(5)
            p = Process(target=main.main, args=(id,))
            p.start()
            started_processes.append(p)
            id += 1
        for p in started_processes:
            p.join()
    

    And each new Process instance you create will end up running a new Python interpreter anyway, so you will not be saving much over the initial solution I offered.

    This is why when you post a question tagged with multiprocessing you are supposed to also tag the question with the platform.