pythonlistmathproductgeometric-mean

Resolving Zeros in Product of items in list


Given that we can easily convert between product of items in list with sum of logarithm of items in list if there are no 0 in the list, e.g:

>>> from operator import mul
>>> pn = [0.4, 0.3, 0.2, 0.1]
>>> math.pow(reduce(mul, pn, 1), 1./len(pn))
0.22133638394006433
>>> math.exp(sum(0.25 * math.log(p) for p in pn))
0.22133638394006436

How should we handle cases where there are 0s in the list and in Python (in a programatically and mathematically correct way)?

More specifically, how should we handle cases like:

>>> pn = [0.4, 0.3, 0, 0]
>>> math.pow(reduce(mul, pn, 1), 1./len(pn))
0.0
>>> math.exp(sum(1./len(pn) * math.log(p) for p in pn))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
ValueError: math domain error

Is returning 0 really the right way to handle this? What is an elegant solution such that we considers the 0s in the list but not end up with 0s?

Since it's some sort of a geometric average (product of list) and it's not exactly useful when we return 0 just because there is a single 0 in the list.

Spill over from Math Stackexchange: https://math.stackexchange.com/questions/1727497/resolving-zeros-in-product-of-items-in-list, No answer from the math people, maybe the python/code Jedis have better ideas at resolving this.


Solution

  • TL;DR: Yes, returning 0 is the only right way. (But see Conclusion.)

    Mathematical background

    In real analysis (i.e. not for complex numbers), when logarithms are considered, we traditionally assume the domain of log are real positive numbers. We have the identity:

    x = exp(log(x)),   for x>0.
    

    It can be naturally extended to x=0 since the limit of the right hand side expression is well defined at x->0+ and equal to 0. Moreover, it's legit to set log(0)=-inf and exp(-inf)=0 (again: only for real, not complex, numbers). Formally, we extend the set of real numbers adding two elements -inf, +inf and defining consistent arithmetic etc. (For our purposes, we need to have inf + x = inf, x * inf = inf for a real x, inf + inf = inf etc.)

    The other identity x = log(exp(x)) is less troublesome and holds for all real numbers (and even x=-inf or +inf).

    Geometric mean

    The geometric mean can be defined for nonnegative numbers (possibly equal to zeros). For two numbers a, b (it naturally generalizes to more numbers, so I'll be using only two further on), it is

    gm(a,b) = sqrt(a*b),   for a,b >= 0.
    

    Certainly, gm(0,b)=0. Taking log, we get:

    log(gm(a,b)) = (log(a) + log(b))/2
    

    and it is well defined if a or b is zero. (We can plug in log(0) = -inf and the identity still holds true thanks to the extended arithmetic we defined earlier.)

    Interpretation

    Not surprisingly, the notion of the geometric mean hails from geometry and was originally (in ancient Greece) used for strictly positive numbers.

    Suppose, we have a rectangular with sides of lengths a and b. Find a square with the area equal to the area of the rectangular. Easy to see, that the side of the square is the geometric mean of a and b.

    Now, if we take a = 0, then we don't really have a rectangular and this geometric interpretation breaks. Similar problems can arise with other interpretations. We can mitigate it by considering, for example, degenerate rectangulars and squares but it may not always be a plausible approach.

    Conclusion

    It's up to a user (mathematician, engineer, programmer) how she understands the meaning of a geometric mean being zero. If it causes serious problems with interpretation of the results or breaks a computer program, then in the first place, maybe the choice of the geometric mean was not justified as a mathematical model.


    Python

    As already mentioned in the other answers, python has infinity implemented. It raises a runtime warning (division by zero) when executing np.exp(np.log(0)) but the result of the operation is correct.