rscopeglobal-variablesstata

Examples of the perils of globals in R and Stata


In recent conversations with fellow students, I have been advocating for avoiding globals except to store constants. This is a sort of typical applied statistics-type program where everyone writes their own code and project sizes are on the small side, so it can be hard for people to see the trouble caused by sloppy habits.

In talking about avoidance of globals, I'm focusing on the following reasons why globals might cause trouble, but I'd like to have some examples in R and/or Stata to go with the principles (and any other principles you might find important), and I'm having a hard time coming up with believable ones.

A useful answer to this question would be a reproducible and self-contained code snippet in which globals cause a specific type of trouble, ideally with another code snippet in which the problem is corrected. I can generate the corrected solutions if necessary, so the example of the problem is more important.

Relevant links:

Global Variables are Bad

Are global variables bad?


Solution

  • I also have the pleasure of teaching R to undergraduate students who have no experience with programming. The problem I found was that most examples of when globals are bad, are rather simplistic and don't really get the point across.

    Instead, I try to illustrate the principle of least astonishment. I use examples where it is tricky to figure out what was going on. Here are some examples:

    1. I ask the class to write down what they think the final value of i will be:

      i = 10
      for(i in 1:5)
          i = i + 1
      i
      

      Some of the class guess correctly. Then I ask should you ever write code like this?

      In some sense i is a global variable that is being changed.

    2. What does the following piece of code return:

      x = 5:10
      x[x=1]
      

      The problem is what exactly do we mean by x

    3. Does the following function return a global or local variable:

       z = 0
       f = function() {
           if(runif(1) < 0.5)
                z = 1
           return(z)
        }
      

      Answer: both. Again discuss why this is bad.