pythonsum

Avoiding Python sum default start arg behavior


I am working with a Python object that implements __add__, but does not subclass int. MyObj1 + MyObj2 works fine, but sum([MyObj1, MyObj2]) led to a TypeError, becausesum() first attempts 0 + MyObj. In order to use sum(), my object needs __radd__ to handle MyObj + 0 or I need to provide an empty object as the start parameter. The object in question is not designed to be empty.

Before anyone asks, the object is not list-like or string-like, so use of join() or itertools would not help.

Edit for details: the module has a SimpleLocation and a CompoundLocation. I'll abbreviate Location to Loc. A SimpleLoc contains one right-open interval, i.e. [start, end). Adding SimpleLoc yields a CompoundLoc, which contains a list of the intervals, e.g. [[3, 6), [10, 13)]. End uses include iterating through the union, e.g. [3, 4, 5, 10, 11, 12], checking length, and checking membership.

The numbers can be relatively large (say, smaller than 2^32 but commonly 2^20). The intervals probably won't be extremely long (100-2000, but could be longer). Currently, only the endpoints are stored. I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.

Questions I've looked at:

I'm considering two solutions. One is to avoid sum() and use the loop offered in this comment. I don't understand why sum() begins by adding the 0th item of the iterable to 0 rather than adding the 0th and 1st items (like the loop in the linked comment); I hope there's an arcane integer optimization reason.

My other solution is as follows; while I don't like the hard-coded zero check, it's the only way I've been able to make sum() work.

# ...
def __radd__(self, other):
    # This allows sum() to work (the default start value is zero)
    if other == 0:
        return self
    return self.__add__(other)

In summary, is there another way to use sum() on objects that can neither be added to integers nor be empty?


Solution

  • Instead of sum, use:

    import operator
    from functools import reduce
    reduce(operator.add, seq)
    

    in Python 2 reduce was built-in so this looks like:

    import operator
    reduce(operator.add, seq)
    

    Reduce is generally more flexible than sum - you can provide any binary function, not only add, and you can optionally provide an initial element while sum always uses one.


    Also note: (Warning: maths rant ahead)

    Providing support for add w/r/t objects that have no neutral element is a bit awkward from the algebraic points of view.

    Note that all of:

    together with addition form a Monoid - i.e. they are associative and have some kind of neutral element.

    If your operation isn't associative and doesn't have a neutral element, then it doesn't "resemble" addition. Hence, don't expect it to work well with sum.

    In such case, you might be better off with using a function or a method instead of an operator. This may be less confusing since the users of your class, seeing that it supports +, are likely to expect that it will behave in a monoidic way (as addition normally does).


    Thanks for expanding, I'll refer to your particular module now:

    There are 2 concepts here:

    It indeed makes sense that simple locations could be added, but they don't form a monoid because their addition doesn't satisfy the basic property of closure - the sum of two SimpleLocs isn't a SimpleLoc. It's, generally, a CompoundLoc.

    OTOH, CompoundLocs with addition looks like a monoid to me (a commutative monoid, while we're at it): A sum of those is a CompoundLoc too, and their addition is associative, commutative and the neutral element is an empty CompoundLoc that contains zero SimpleLocs.

    If you agree with me (and the above matches your implementation), then you'll be able to use sum as following:

    sum( [SimpleLoc1, SimpleLoc2, SimpleLoc3], start=ComplexLoc() )
    

    Indeed, this appears to work.


    I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.

    Well, locations are some sets of numbers, so it makes sense to throw a set-like interface on top of them (so __contains__, __iter__, __len__, perhaps __or__ as an alias of +, __and__ as the product, etc).

    As for construction from xrange, do you really need it? If you know that you're storing sets of intervals, then you're likely to save space by sticking to your representation of [start, end) pairs. You could throw in an utility method that takes an arbitrary sequence of integers and translates it to an optimal SimpleLoc or CompoundLoc if you feel it's going to help.