pythonpython-multithreading

Can I use sleep in the main thread without blocking other threads?


I am running a python script every hour and I've been using time.sleep(3600) inside of a while loop. It seems to work as needed but I am worried about it blocking new tasks. My research of this seems to be that it only blocks the current thread but I want to be 100% sure. While the hourly job shouldn't take more than 15 minutes, if it does or if it hangs, I don't want it to block the next one that starts. This is how I've done it:

import threading
import time


def long_hourly_job():
    # do some long task
    pass


if __name__ == "__main__":
    while True:
        thr = threading.Thread(target=long_hourly_job)
        thr.start()
        time.sleep(3600)

Is this sufficient?

Also, the reason I am using time.sleep for this hourly job rather than a cron job is I want to do everything in code to make dockerization cleaner.


Solution

  • The code will work (i.e.: sleep does only block the calling thread), but you should be careful of some issues. Some of them have been already stated in the comments, like the possibility of time overlaps between threads.

    The main issue is that your code is slowly leaking resources. After creating a thread, the OS keeps some data structures even after the thread has finished running. This is necessary, for example to keep the thread's exit status until the thread's creator requires it. The function to clear these structures (conceptually equivalent to closing a file) is called join. A thread that has finished running and is not joined is termed a 'zombie thread'. The amount of memory required by these structures is very small, and your program should run for centuries for any reasonable amount of available RAM. Nevertheless, it is a good practice to join all the threads you create.

    A simple approach (if you know that 3600 s is more than enough time for the thread to finish) would be:

    if __name__ == "__main__":
        while True:
            thr = threading.Thread(target=long_hourly_job)
            thr.start()
            thr.join(3600)  # wait at most 3600 s for the thread to finish
            if thr.isAlive(): # join does not return useful information
                print("Ooops: the last job did not finish on time")
    

    A better approach if you think that it is possible that sometimes 3600 s is not enough time for the thread to finish:

    if __name__ == "__main__":
        previous = []
        while True:
            thr = threading.Thread(target=long_hourly_job)
            thr.start()
            previous.append(thr)
            time.sleep(3600)
            for i in reversed(range(len(previous))):
                t = previous[i]
                t.join(0)
                if t.isAlive():
                    print("Ooops: thread still running")
                else:
                    print("Thread finished")
                    previous.remove(t)
    

    I know that the print statement makes no sense: use logging instead.