cvolatile

Why is the volatile keyword necessary in this program?


Related questions have been asked many times but they all seem to vaguely allude to possible optimizations a compiler might otherwise make, thereby necessitating our use of volatile to avoid said optimizations. Thus, I am here interested in understanding what optimizations a compiler would make or, equivalently, what would (or might) go wrong if I did not include volatile in the following snippet which polls some keyboard with a memory-mapped interface and then, once a character is received, sends it off to a display:

/* Define register addresses. */
#define KBD_DATA (volatile char *) 0x4000
#define KBD_STATUS (volatile char *) 0x4004
#define DISP_DATA (volatile char *) 0x4008
#define DISP_STATUS (volatile char *) 0x4012

void main() {
  char ch;
  /* Transfer the characters. */ 
  while (1) {
    while ((*KBD_STATUS & 0x2) == 0); 
    ch = *KBD_DATA;
    while ((*DISP_STATUS & 0x4) == 0); 
    *DISP_DATA = ch;
  } 
}

The above is from Computer Organization by Hamacher et al. They write the following:

Note that the KBD_STATUS and DISP_STATUS pointers are declared as being volatile. This is necessary because the program only reads the contents of the corresponding locations. No data are written to those locations. An optimizing compiler may remove program statements that appear to have no impact, which includes statements referring to locations in memory that are read but never written. Since the contents of the memory-mapped KBD_STATUS and DISP_STATUS registers change under influences external to the pro- gram, it is essential to inform the compiler of this fact. The compiler will not remove statements that involve pointers or other variables that are declared to be volatile.

The emphasis is mine in the above.

Rephrasing my question from above, I do not follow how or why an optimizing compiler could remove the program statements referring to KBD_STATUS and DISP_STATUS without changing the semantics of the program. That is, why would the compiler remove statements which involve these two memory locations specifically? Or are Hamacher et al. rather saying that the compiler might choose to read (a manner of speaking, I know this means "translate to assembly which is such that...") from these memory locations the very first time around and then assume that this memory location won't change, so that no subsequent assembly read instructions from these memory locations need be produced?


Solution

  • They're summarizing several cases.

    1. The variable has an initial value. If it's never assigned to, it's effectively a constant, and the compiler can replace any references to the variable with that initial value.
    2. The variable has no initial value. In that case, reads from the variable are indeterminate, and the compiler can treat it as having any value. So it can remove the reads and use some arbitrary value.
    3. Even if it doesn't remove all reads entirely, when there's a loop like the ones you show, it can assume that the value doesn't change during the loop. So a loop like
    while ((*KBD_STATUS & 0x2) == 0); 
    

    can be compiled as if it were

    if ((*KBD_STATUS & 0x2) == 0) {
        while (1) ; // infinite loop
    }