pythonweb-scrapingcloudremote-server

How to ensure a clean exit with a script hosted on the cloud?


I built a web scraper that would scrape multiple websites for data, problem is that the data is very large and takes like 8 hours to scrape(with the use of sleep don't want to bother their servers too much).

Well, the cloud service I want to host it on will only run it for 6 hours before killing the script so Im making it so that it picks up where it left off when restarted. How do I ensure a clean exit of the code when the cloud service kills it? I don't want anything unexpected to happen to the data.


Solution

  • What you need to do is graceful shutdown. In python you can register for signals. Usually, when OS or the main process on the cloud running your python script want to terminate they will send a signal signal.SIGINT(CTRL + C) or signal.SIGTERM. You can register handlers and terminate your program gracefully.

    Check Python signal handling, this answer and examples