pythonnumericoperationstability

Operations: Saving in Variables Then Operating vs Single Liners


I am writing a program in Python (using the numpy package). I am writing a program that contains a very long function that involves many terms:

result = a + b + c + d +...

...whatever. These terms a, b, c, d, etc...themselves are matrices that involve many operations, for example in Python code:

a = np.identity(3, dtype = np.double)/3.0
b = np.kron(vec1, vec2).reshape(3,3) # Also with np.double precision.

Just taking two variables, I have been wondering if doing:

a = np.identity(3, dtype = np.double)/3.0
b = np.kron(vec1, vec2).reshape(3,3) # Also with np.double precision.
c = a + b

is the same as doing:

c = np.identity(3, dtype = np.double)/3.0 + np.kron(vec1, vec2).reshape(3,3)

This may sound silly, but I require a very high numerical stability, i.e., introducing numerical errors, as subtle as they are, might ruing the program or yield a weird result. Of course, this question can be extended to other programming languages.

Which is suggested? Does it matter? Any suggested references?


Solution

  • Under "normal" circumstances, both approaches are equivalent.

    In other words, whether you use a value through an explicit expression (eg, np.identity(3, dtype = np.double)/3.0) or through a variable-name that has been initialized with that expression (here, a), the outcome would "normally" be the same.

    There are some not-so-normal circumstances, where they may produce different results. As far as I can see all these have to do with situations in which there are side-effects such that the outcome depends upon the order in which things happen. For example:

    Consider a scenario where the initialization of the variable-name b involves a side-effect that affects the initialization of the variable-name a. And let's say your code depends on that side-effect. In this scenario, in the case of the fist approach (where you first initialize the variable-names and then use only those variables), your code would have to initialize b first, and a later -- the order of the initialization of the variable-names matters. In the second approach (where you would have explicit expressions rather than variable-names, participating in a larger expression), to achieve the same effect, you will have to pay attention to the order in which Python interpreter evaluates sub-expressions within an expression. If you don't, then the order of evaluation of sub-expressions may not produce the side-effect that your code needs, and you might end up getting a different result.

    As for other programming languages the answer is a big yes, the two approaches can yield different results, in languages (such as Java), where the variable-names have associated data-types, which can cause some silent numerical conversions (such as truncations) to happen during variable-assignment.