pythonperformanceuncertainty

Unexpectedly long computation time with uncertainties package


Consider the following snipped of code:

import random
from uncertainties import unumpy, ufloat

x = [random.uniform(0,1) for p in range(1,8200)]
y = [random.randrange(0,1000) for p in range(1,8200)]
xerr = [random.uniform(0,1)/1000 for p in range(1,8200)]
yerr = [random.uniform(0,1)*10 for p in range(1,8200)]

x = unumpy.uarray(x, xerr)
y = unumpy.uarray(y, yerr)
diff = sum(x*y)
u = ufloat(0.0, 0.0)
for k in range(len(x)):
    u+= (diff-x[k])**2 * y[k]  

print(u)

If I try to run it on my computer it takes up to 10 minutes to produce a result. I'm not really sure why this is the case and would appreciated some kind of explanation. If I had to guess I'd say the computation of the uncertainties is for some reason more complicated than one would think, but like I said, it's just a guess. Interestingly enough the code is almost immediately done if if remove the print instruction at the end, which honestly confuses me more than it helps...

In case you don't know it, this is the uncertainties library's repo.


Solution

  • I can reproduce this, the print is what is taking forever. Or rather, it is the conversion to string implicitly called by print. I used line_profiler to measure the time of the __format__ function of AffineScalarFunc. (It is called by __str__, which is called by print) I decreased the array size from 8200 to 1000 to make it go a bit faster. This is the result (pruned for readability):

    Timer unit: 1e-06 s
    
    Total time: 29.1365 s
    File: /home/veith/Projects/stackoverflow/test/lib/python3.6/site-packages/uncertainties/core.py
    Function: __format__ at line 1813
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      1813                                               @profile
      1814                                               def __format__(self, format_spec):
    
      1960                                           
      1961                                                   # Since the '%' (percentage) format specification can change
      1962                                                   # the value to be displayed, this value must first be
      1963                                                   # calculated. Calculating the standard deviation is also an
      1964                                                   # optimization: the standard deviation is generally
      1965                                                   # calculated: it is calculated only once, here:
      1966         1          2.0      2.0      0.0          nom_val = self.nominal_value
      1967         1   29133097.0 29133097.0    100.0          std_dev = self.std_dev
      1968                                           
    

    You can see that almost all of the time is taken in line 1967, where the standard deviation is computed. If you dig a bit deeper, you will find that the error_components property is the problem, where the derivatives property is the problem, in which _linear_part.expand() is the problem. If you profile that, you begin to get to the root of the problem. Most work here is evenly-ish distributed:

    Function: expand at line 1481
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
      1481                                               @profile
      1482                                               def expand(self):
      1483                                                   """
      1484                                                   Expand the linear combination.
      1485                                           
      1486                                                   The expansion is a collections.defaultdict(float).
      1487                                           
      1488                                                   This should only be called if the linear combination is not
      1489                                                   yet expanded.
      1490                                                   """
      1491                                           
      1492                                                   # The derivatives are built progressively by expanding each
      1493                                                   # term of the linear combination until there is no linear
      1494                                                   # combination to be expanded.
      1495                                           
      1496                                                   # Final derivatives, constructed progressively:
      1497         1          2.0      2.0      0.0          derivatives = collections.defaultdict(float)
      1498                                           
      1499  15995999    4942237.0      0.3      9.7          while self.linear_combo:  # The list of terms is emptied progressively
      1500                                           
      1501                                                       # One of the terms is expanded or, if no expansion is
      1502                                                       # needed, simply added to the existing derivatives.
      1503                                                       #
      1504                                                       # Optimization note: since Python's operations are
      1505                                                       # left-associative, a long sum of Variables can be built
      1506                                                       # such that the last term is essentially a Variable (and
      1507                                                       # not a NestedLinearCombination): popping from the
      1508                                                       # remaining terms allows this term to be quickly put in
      1509                                                       # the final result, which limits the number of terms
      1510                                                       # remaining (and whose size can temporarily grow):
      1511  15995998    6235033.0      0.4     12.2              (main_factor, main_expr) = self.linear_combo.pop()
      1512                                           
      1513                                                       # print "MAINS", main_factor, main_expr
      1514                                           
      1515  15995998   10572206.0      0.7     20.8              if main_expr.expanded():
      1516  15992002    6822093.0      0.4     13.4                  for (var, factor) in main_expr.linear_combo.items():
      1517   7996001    8070250.0      1.0     15.8                      derivatives[var] += main_factor*factor
      1518                                           
      1519                                                       else:  # Non-expanded form
      1520  23995993    8084949.0      0.3     15.9                  for (factor, expr) in main_expr.linear_combo:
      1521                                                               # The main_factor is applied to expr:
      1522  15995996    6208091.0      0.4     12.2                      self.linear_combo.append((main_factor*factor, expr))
      1523                                           
      1524                                                       # print "DERIV", derivatives
      1525                                           
      1526         1          2.0      2.0      0.0          self.linear_combo = derivatives
    

    You can see that there are a lot of calls to expanded, which calls isinstance, which is slow. Also note the comments, which hint that this library actually only calculates the derivatives when it is required (and is aware that it is really slow otherwise). This is why the conversion to string takes so long, and the time is not taken before.

    In __init__ of AffineScalarFunc:

    # In order to have a linear execution time for long sums, the
    # _linear_part is generally left as is (otherwise, each
    # successive term would expand to a linearly growing sum of
    # terms: efficiently handling such terms [so, without copies]
    # is not obvious, when the algorithm should work for all
    # functions beyond sums).
    

    In std_dev of AffineScalarFunc:

    #! It would be possible to not allow the user to update the
    #std dev of Variable objects, in which case AffineScalarFunc
    #objects could have a pre-calculated or, better, cached
    #std_dev value (in fact, many intermediate AffineScalarFunc do
    #not need to have their std_dev calculated: only the final
    #AffineScalarFunc returned to the user does).
    

    In expand of LinearCombination:

       # The derivatives are built progressively by expanding each
        # term of the linear combination until there is no linear
        # combination to be expanded.
    

    So all in all, this is somewhat expected, since the library handles these non-native numbers that require a lot of operations to handle (apparently).