Stopping long-sleep threads

Let's suppose I have a thread which should perform some task periodically but this period is ~~6 times each hour~~ 12 times each hour (every 5 minutes), I've often seen code which controls the thread loop with a is_running flag which is checked every loop, like this:

std::atomic<bool> is_running;

void start()
{
    is_running.store(true);
    std::thread { thread_function }.detach();
}

void stop()
{
    is_running.store(false);
}

void thread_function()
{
    using namespace std::literals;
    while (is_running.load())
    {
        // do some task...
        std::this_thread::sleep_for(5min);
    }
}

But if the stop() function is called, let's say, 1 millisecond after start() the thread would be alive for 299999 additional milliseconds until it awakes, checks the flag, and die.

Is my understanding correct? How to avoid keeping alive (but sleeping) a thread which should have been ended? My best approach until now is the following:

void thread_function()
{
    using namespace std::literals;
    while (is_running.load())
    {
        // do some task...
        for (unsigned int b = 0u, e = 1500u; is_running.load() && (b != e); ++b)
        {
            // 1500 * 200 = 300000ms = 5min
            std::this_thread::sleep_for(200ms);
        }
    }
}

Is there a less-dirty and more straightforward way to achieve this?

Solution

Use a condition variable. You wait on the condition variable or 5 minutes passing. Remember to check for spurious wakeups.

cppreference

I cannot find a good stack overflow post on how to use a condition variable in a minute or two of google searching. The tricky part is realizing that the wait can wake up with neither 5 minutes passing, nor a signal being sent. The cleanest way to handle this is to use the wait methods with a lambda that double-checks that the wakeup was a "good" one.

here is some sample code over at cppreference that uses wait_until with a lambda. (wait_for with a lambda is equivalent to wait_until with a lambda). I modified it slightly.

Here is an version:

struct timer_killer {
  // returns false if killed:
  template<class R, class P>
  bool wait_for( std::chrono::duration<R,P> const& time ) const {
    std::unique_lock<std::mutex> lock(m);
    return !cv.wait_for(lock, time, [&]{return terminate;});
  }
  void kill() {
    std::unique_lock<std::mutex> lock(m);
    terminate=true; // should be modified inside mutex lock
    cv.notify_all(); // it is safe, and *sometimes* optimal, to do this outside the lock
  }
  // I like to explicitly delete/default special member functions:
  timer_killer() = default;
  timer_killer(timer_killer&&)=delete;
  timer_killer(timer_killer const&)=delete;
  timer_killer& operator=(timer_killer&&)=delete;
  timer_killer& operator=(timer_killer const&)=delete;
private:
  mutable std::condition_variable cv;
  mutable std::mutex m;
  bool terminate = false;
};

live example.

You create a timer_killer in a shared spot. Client threads can wait_for( time ). If it returns false, it means you where killed before your wait was complete.

The controlling thread just calls kill() and everyone doing a wait_for gets a false return.

Note that there is some contention (locking of the mutex), so this isn't suitable for infinite threads (but few things are). Consider using a scheduler if you need to have an unbounded number of tasks that are run with arbitrary delays instead of a full thread per delayed repeating task -- each real thread is upwards of a megabyte of system address space used (just for the stack).

timer_killer is badly written. You should move the business logic out of the low level threading details, such as:

enum class notify {
  none,
  one,
  all,
};
 
template<class T>
struct waitable {
  // Returns true when F(T const&) returns true, false on timeout:
  template<class F, class R, class P> // requires F(T const&)->bool
  bool wait_for(std::chrono::duration<R,P> const& time, F f) const {
    auto l = lock();
    return cv.wait_for(l, time, f(t));
  }
  template<class F> // returns F(T const&)
  auto read( F f ) const {
    auto l = lock();
    return f(t);
  }
  template<class F> // requires F(T&)->enum notify
  void modify(F f) {
    auto l = lock();
    notify n = f(t);
    switch(n) {
      case notify::none: return;
      case notify::all: cv.notify_all(); return;
      case notify::one: cv.notify_one(); return;
    }
  }
  waitable( T in ): t(std::forward<T>(in)) {}
  
private:
  auto lock() const {
    return std::unique_lock<std::mutex>(m);
  }
  mutable std::mutex m;
  mutable std::condition_variable cv;
  T t;
};

now timer_killer is:

struct timer_killer {
  enum class wait_result {
    timeout,
    killed,
  };
  // returns true if killed
  template<class R, class P>
  wait_result wait_for( std::chrono::duration<R,P> const& time ) const {
    return flag.wait_for(
      time,
      [](bool terminate){ return terminate; }
    )?wait_result::killed:wait_result::timeout;
  }
  void kill() {
    flag.modify(
      [](bool& terminate){
        terminate=true;
        return notify::all;
      }
    );
  }
private:
  waitable<bool> flag = false;
};

this moves all of the std mutex/condition variable/etc logic out of the business logic of this specific class. Users of waitable merely have to write a function that checks if they should stop waiting (and return true), and write a function that modifies the state to indicate the waiting is over (then return if everyone, or just one, or no, waiter is needed to consume the state change).

I also changed the true/false return on wait_for to an enum - which is true and which is false is arbitrary in my experience and causes lots of bugs.

waitable seems relatively pointless and thin - but in practice, it prevents you from "accidentally" messing with the state (t) or fiddling with the state and not notifying or modifying the state while waiting and other things that cause bugs. Also, because how you lock, read, write and notify relative to each other is important and changing it can cause subtle bugs, making it harder to write bespoke code doing that is good.

There are some advanced uses of condition_variable that don't admit the waitable<T> pattern, but 99% uses of it fit perfectly, so the template really reduces how often you get to make errors in that code.