embeddedc++14interruptvolatilebare-metal

Volatile and optimization with different compilation units


Please have a look at this more or less standard example that i see people use when talking about the usage of volatile in the context of embedded c++ firmware development on a baremetal target:

// main.cpp
bool flag = false; // note: not volatile

void some_interrupt_routine()
{
    // hw: no nested interrupts possible
    flag = true;
}

void main()
{
    while (true) {
        // assume atomic reads/writes of "flag"
        if (flag == true) {
            flag = false;
            // do something
        }
    }
}

Most, if not all, compilers with optimization enabled will replace the loop in main() with an unconditional endless loop that does nothing, if flag is not declared volatile. What happens if "do something" is a call to a function inside another compilation unit?

Please make the following assumptions:

  1. a gcc-like compiler with optimization enabled
  2. c++14 is used
  3. all cpp files are compiled separately into object files which are then linked into a final executable
  4. link time optimization is not enabled
  5. no hardware barrier, fence or other synchronisation instructions are available.
  6. no compiler barrier, fence or other synchronisation intrinsics are available, especially __asm volatile("" ::: "memory");
  7. std::atomic is not implemented by the compiler vendor
  8. reads and writes to the variable "flag" are atomic

My Question is:

Is it safe to assume the compiler will not be able to optimize the loop as it can not know if the call to "do something" will change the value of "flag"?

Also: is it safe to assume, that the assembly code generated for the function "some_interrupt_routine" will always write to the variable "flag" as it is a global variable? My reasoning is that the compiler can't possibly know if code in another compilation unit might call some_interrupt_routine and check the value of flag afterwards. Of course the linker might remove the function from the final executable.

I did some tests and it seems my assumptions are right. I guess in this context it is not relevant that the value is changed in an interrupt routine or any other function inside another compilation unit.

I also did look at other related questions, but most of them either talk about:

Most of these points are certainly true, but i feel like they do not answer my question. Don't get me wrong, i would love to use std::atomic, but it is simply not available to me.

However, i might be wrong and just wanted to confirm this.


Solution

  • Is it safe to assume the compiler will not be able to optimize the loop as it can not know if the call to "do something" will change the value of "flag"?

    Pretty much, yeah. It cannot assume that the function call to the external translation unit did not update the variable. (One of many reasons why globals are bad, as opposed to static file scope variables.)


    Also: is it safe to assume, that the assembly code generated for the function "some_interrupt_routine" will always write to the variable "flag" as it is a global variable? My reasoning is that the compiler can't possibly know if code in another compilation unit might call some_interrupt_routine and check the value of flag afterwards. Of course the linker might remove the function from the final executable.

    Calling an ISR from software is an unusual practice. Only some ISA do allow this and it's still a rather strange thing to do. But in general terms, the compiler may or may not treat the ISR differently due to whatever non-standard interrupt keyword/pragma etc you used to flag it as an ISR. Which may or may not include the assumption that the ISR is never called from software.

    In general, this ISR-specific flagging is also what prevents the linker from removing the function. So I'm not quite sure what scenario you are asking about here.


    • how volatile is not enough for synchronisation OR

    Correct

    • how you should use std::atomic for variables shared between application and interrupts OR

    That is just one of several ways to protect against race condition bugs.

    how volatile should not be used at all but rather to use locking mechanisms provided by the os OR

    That's wrong and the post is probably about PC programming rather than RTOS/bare metal. Although the RTOS might provide such means and in that case do use them, together with volatile.

    how volatile should be used exclusively for accessing memory mapped registers

    That is not the only use for it, no.


    If you haven't gotten access to atomic libs, investigate other means of protection. The most obvious one being to disable the specific interrupt temporarily while accessing the variable from the caller. This will go quick and on most MCUs you will still get the ISR triggering flag set if the hardware event happens when the ISR is disabled. Meaning it will then immediately jump into the ISR when you enable it once more and there was unlikely any information loss. But if you can do this or not is obviously very hard-ware dependent.

    Another trick if the ISR will only trigger with sufficient time in between and interrupts may not be interrupted is "poor man's semaphore":

    volatile bool poor_semaphore;
    volatile type data;
    
    void some_isr ()
    {
      if(!poor_semaphore)
        data = value;
    }
    
    void main ()
    {
      ...
      poor_semaphore = true;
        local_variable = data;
      poor_semaphore = false;
    }
    

    Access to poor_semaphore is by no means atomic, but it is sequenced in relation to the access of the actual data, so in case the ISR triggers in the middle of writing to the semaphore - who cares. It's the situation where it triggers in the middle of the non-atomic data read/write we want to avoid.

    This method will mean that the ISR fails to update the variable in a timely manner though, which may or may not be an issue. Some advanced flavour of this is to disable the global interrupt mask from the ISR when it finds the poor sempahore taken, then hang in a loop and let main() finish.