pythonlinuxservicepython-daemon

Service relaunch Python script when it has stalled in Linux


I am trying to run a Python script as service in Linux. I found some good instructions here, and how to restart a failed script in the next run here.

However, I have another scenario, where the script does not abort with failure, but it just stalls (it is downloading some resource, and it just stays stuck at 99%). When I run it manually, I can observe it stuck for 1-2 minutes, and then I force abort the script (CTRL-C) and rerun and it works fine.

How can I make the service do that as well? I can pipe all the output of the script to a file (right now the output is being piped to STDOUT, where I can observe the stalling), is there a way for the service to observe that the piped output file hasn't updated in last 5 minutes, and then so that it can force restart the script, even though the script was already in running mode (but stalled)?


Solution

  • Instead of monitoring the output, you could add a timeout on the function that may cause the script to stall. This is explained here.

    Basically, that means creating a signal that, if not caught and handled, will raise an exception. This signal will then be handled when the function is complete (or will not be handled if it is stuck, of course).

    An example from the thread I linked:

    In [1]: import signal
    
    # Register an handler for the timeout
    In [2]: def handler(signum, frame):
       ...:     print("Forever is over!")
       ...:     raise Exception("end of time")
       ...: 
    
    # This function *may* run for an indetermined time...
    In [3]: def loop_forever():
       ...:     import time
       ...:     while 1:
       ...:         print("sec")
       ...:         time.sleep(1)
       ...:         
       ...:         
    
    # Register the signal function handler
    In [4]: signal.signal(signal.SIGALRM, handler)
    Out[4]: 0
    
    # Define a timeout for your function
    In [5]: signal.alarm(10)
    Out[5]: 0
    
    In [6]: try:
       ...:     loop_forever()
       ...: except Exception, exc: 
       ...:     print(exc)
       ....: 
    sec
    sec
    sec
    sec
    sec
    sec
    sec
    sec
    Forever is over!
    end of time
    
    # Cancel the timer if the function returned before timeout
    # (ok, mine won't but yours maybe will :)
    In [7]: signal.alarm(0)
    Out[7]: 0
    

    In the thread, there is also another explanation on how to do this with multiprocessing.Process, that looks like this:

    import multiprocessing
    import time
    
    # bar
    def bar():
        for i in range(100):
            print "Tick"
            time.sleep(1)
    
    if __name__ == '__main__':
        # Start bar as a process
        p = multiprocessing.Process(target=bar)
        p.start()
    
        # Wait for 10 seconds or until process finishes
        p.join(10)
    
        # If thread is still active
        if p.is_alive():
            print "running... let's kill it..."
    
            # Terminate - may not work if process is stuck for good
            p.terminate()
            # OR Kill - will work for sure, no chance for process to finish nicely however
            # p.kill()
    
            p.join()