Unexpectedly long computation time with uncertainties package

Consider the following snipped of code:

import random
from uncertainties import unumpy, ufloat

x = [random.uniform(0,1) for p in range(1,8200)]
y = [random.randrange(0,1000) for p in range(1,8200)]
xerr = [random.uniform(0,1)/1000 for p in range(1,8200)]
yerr = [random.uniform(0,1)*10 for p in range(1,8200)]

x = unumpy.uarray(x, xerr)
y = unumpy.uarray(y, yerr)
diff = sum(x*y)
u = ufloat(0.0, 0.0)
for k in range(len(x)):
    u+= (diff-x[k])**2 * y[k]  

print(u)

If I try to run it on my computer it takes up to 10 minutes to produce a result. I'm not really sure why this is the case and would appreciated some kind of explanation. If I had to guess I'd say the computation of the uncertainties is for some reason more complicated than one would think, but like I said, it's just a guess. Interestingly enough the code is almost immediately done if if remove the print instruction at the end, which honestly confuses me more than it helps...

In case you don't know it, this is the uncertainties library's repo.

Solution

I can reproduce this, the print is what is taking forever. Or rather, it is the conversion to string implicitly called by print. I used line_profiler to measure the time of the __format__ function of AffineScalarFunc. (It is called by __str__, which is called by print) I decreased the array size from 8200 to 1000 to make it go a bit faster. This is the result (pruned for readability):

Timer unit: 1e-06 s

Total time: 29.1365 s
File: /home/veith/Projects/stackoverflow/test/lib/python3.6/site-packages/uncertainties/core.py
Function: __format__ at line 1813

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1813                                               @profile
  1814                                               def __format__(self, format_spec):

  1960                                           
  1961                                                   # Since the '%' (percentage) format specification can change
  1962                                                   # the value to be displayed, this value must first be
  1963                                                   # calculated. Calculating the standard deviation is also an
  1964                                                   # optimization: the standard deviation is generally
  1965                                                   # calculated: it is calculated only once, here:
  1966         1          2.0      2.0      0.0          nom_val = self.nominal_value
  1967         1   29133097.0 29133097.0    100.0          std_dev = self.std_dev
  1968

You can see that almost all of the time is taken in line 1967, where the standard deviation is computed. If you dig a bit deeper, you will find that the error_components property is the problem, where the derivatives property is the problem, in which _linear_part.expand() is the problem. If you profile that, you begin to get to the root of the problem. Most work here is evenly-ish distributed:

Function: expand at line 1481

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1481                                               @profile
  1482                                               def expand(self):
  1483                                                   """
  1484                                                   Expand the linear combination.
  1485                                           
  1486                                                   The expansion is a collections.defaultdict(float).
  1487                                           
  1488                                                   This should only be called if the linear combination is not
  1489                                                   yet expanded.
  1490                                                   """
  1491                                           
  1492                                                   # The derivatives are built progressively by expanding each
  1493                                                   # term of the linear combination until there is no linear
  1494                                                   # combination to be expanded.
  1495                                           
  1496                                                   # Final derivatives, constructed progressively:
  1497         1          2.0      2.0      0.0          derivatives = collections.defaultdict(float)
  1498                                           
  1499  15995999    4942237.0      0.3      9.7          while self.linear_combo:  # The list of terms is emptied progressively
  1500                                           
  1501                                                       # One of the terms is expanded or, if no expansion is
  1502                                                       # needed, simply added to the existing derivatives.
  1503                                                       #
  1504                                                       # Optimization note: since Python's operations are
  1505                                                       # left-associative, a long sum of Variables can be built
  1506                                                       # such that the last term is essentially a Variable (and
  1507                                                       # not a NestedLinearCombination): popping from the
  1508                                                       # remaining terms allows this term to be quickly put in
  1509                                                       # the final result, which limits the number of terms
  1510                                                       # remaining (and whose size can temporarily grow):
  1511  15995998    6235033.0      0.4     12.2              (main_factor, main_expr) = self.linear_combo.pop()
  1512                                           
  1513                                                       # print "MAINS", main_factor, main_expr
  1514                                           
  1515  15995998   10572206.0      0.7     20.8              if main_expr.expanded():
  1516  15992002    6822093.0      0.4     13.4                  for (var, factor) in main_expr.linear_combo.items():
  1517   7996001    8070250.0      1.0     15.8                      derivatives[var] += main_factor*factor
  1518                                           
  1519                                                       else:  # Non-expanded form
  1520  23995993    8084949.0      0.3     15.9                  for (factor, expr) in main_expr.linear_combo:
  1521                                                               # The main_factor is applied to expr:
  1522  15995996    6208091.0      0.4     12.2                      self.linear_combo.append((main_factor*factor, expr))
  1523                                           
  1524                                                       # print "DERIV", derivatives
  1525                                           
  1526         1          2.0      2.0      0.0          self.linear_combo = derivatives

You can see that there are a lot of calls to expanded, which calls isinstance, which is slow. Also note the comments, which hint that this library actually only calculates the derivatives when it is required (and is aware that it is really slow otherwise). This is why the conversion to string takes so long, and the time is not taken before.

In __init__ of AffineScalarFunc:

# In order to have a linear execution time for long sums, the
# _linear_part is generally left as is (otherwise, each
# successive term would expand to a linearly growing sum of
# terms: efficiently handling such terms [so, without copies]
# is not obvious, when the algorithm should work for all
# functions beyond sums).

In std_dev of AffineScalarFunc:

#! It would be possible to not allow the user to update the
#std dev of Variable objects, in which case AffineScalarFunc
#objects could have a pre-calculated or, better, cached
#std_dev value (in fact, many intermediate AffineScalarFunc do
#not need to have their std_dev calculated: only the final
#AffineScalarFunc returned to the user does).

In expand of LinearCombination:

   # The derivatives are built progressively by expanding each
    # term of the linear combination until there is no linear
    # combination to be expanded.

So all in all, this is somewhat expected, since the library handles these non-native numbers that require a lot of operations to handle (apparently).