pythonpython-3.xnannumerical

Propagation of NaN through calculations


Normally, NaN (not a number) propagates through calculations, so I don't need to check for NaN in each step. This works almost always, but apparently there are exceptions. For example:

>>> nan = float('nan')
>>> pow(nan, 0)
1.0

I found the following comment on this:

The propagation of quiet NaNs through arithmetic operations allows errors to be detected at the end of a sequence of operations without extensive testing during intermediate stages. However, note that depending on the language and the function, NaNs can silently be removed in expressions that would give a constant result for all other floating-point values e.g. NaN^0, which may be defined as 1, so in general a later test for a set INVALID flag is needed to detect all cases where NaNs are introduced.

To satisfy those wishing a more strict interpretation of how the power function should act, the 2008 standard defines two additional power functions; pown(x, n) where the exponent must be an integer, and powr(x, y) which returns a NaN whenever a parameter is a NaN or the exponentiation would give an indeterminate form.

Is there a way to check the INVALID flag mentioned above through Python? Alternatively, is there any other approach to catch cases where NaN does not propagate?

Motivation: I decided to use NaN for missing data. In my application, missing inputs should result in missing result. It works great, with the exception I described.


Solution

  • I've come across a similar problem (i.e. pow(float('nan'), 1) throws an exception in some Python implementations, e.g. Jython 2.5.2b2), and I found the above answers weren't quite what I was looking for.

    Using a MissingData type as suggested by 6502 seems like the way to go, but I needed a concrete example. I tried Ethan Furman's NullType class but found that that this didn't work with any arithmetic operations as it doesn't coerce data types (see below), and I also didn't like that it explicitly named each arithmetic function that was overriden.

    Starting with Ethan's example and tweaking code I found on ActiveState (by mark andrew), I arrived at the class below. Although the class is heavily commented you can see that it actually only has a handful of lines of functional code in it.

    The key points are:

    1. Use coerce() to return two NoData objects for mixed type (e.g. NoData + float) arithmetic operations, and two strings for string based (e.g. concat) operations.
    2. Use getattr() to return a callable NoData() object for all other attribute/method access
    3. Use call() to implement all other methods of the NoData() object: by returning a NoData() object

    Here's some examples of its use.

    >>> nd = NoData()
    >>> nd + 5
    NoData()
    >>> pow(nd, 1)
    NoData()
    >>> math.pow(NoData(), 1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: nb_float should return float object
    >>> nd > 5
    NoData()
    >>> if nd > 5:
    ...     print "Yes"
    ... else:
    ...     print "No"
    ... 
    No
    >>> "The answer is " + nd
    'The answer is NoData()'
    >>> "The answer is %f" % (nd)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: float argument required, not instance
    >>> "The answer is %s" % (nd)
    'The answer is '
    >>> nd.f = 5
    >>> nd.f
    NoData()
    >>> nd.f()
    NoData()
    

    I noticed that using pow with NoData() calls the ** operator and hence works with NoData, but using math.pow does not as it first tries to convert the NoData() object to a float. I'm happy using the non math pow - hopefully 6502 etc were using math.pow when they had problems with pow in their comments above.

    The other issue I can't think of a way of solving is the use with the format (%f) operator... No methods of NoData are called in this case, the operator just fails if you don't provide a float. Anyway here's the class itself.

    class NoData():
        """NoData object - any interaction returns NoData()"""
        def __str__(self):
            #I want '' returned as it represents no data in my output (e.g. csv) files
            return ''        
        
        def __unicode__(self):
            return ''
            
        def __repr__(self):
            return 'NoData()'
            
        def __coerce__(self, other_object):
            if isinstance(other_object, str) or isinstance(other_object, unicode):
                #Return string objects when coerced with another string object.
                #This ensures that e.g. concatenation operations produce strings.
                return repr(self), other_object  
            else:
                #Otherwise return two NoData objects - these will then be passed to the appropriate
                #operator method for NoData, which should then return a NoData object
                return self, self
         
        def __nonzero__(self):
            #__nonzero__ is the operation that is called whenever, e.g. "if NoData:" occurs
            #i.e. as all operations involving NoData return NoData, whenever a 
            #NoData object propagates to a test in branch statement.       
            return False        
    
        def __hash__(self):
            #prevent NoData() from being used as a key for a dict or used in a set
            raise TypeError("Unhashable type: " + self.repr())
    
        def __setattr__(self, name, value):
            #This is overridden to prevent any attributes from being created on NoData when e.g. "NoData().f = x" is called
            return None       
           
        def __call__(self, *args, **kwargs):
            #if a NoData object is called (i.e. used as a method), return a NoData object
            return self    
        
        def __getattr__(self,name):
            #For all other attribute accesses or method accesses, return a NoData object.
            #Remember that the NoData object can be called (__call__), so if a method is called, 
            #a NoData object is first returned and then called.  This works for operators,
            #so e.g. NoData() + 5 will:
            # - call NoData().__coerce__, which returns a (NoData, NoData) tuple
            # - call __getattr__, which returns a NoData object
            # - call the returned NoData object with args (self, NoData)
            # - this call (i.e. __call__) returns a NoData object   
            
            #For attribute accesses NoData will be returned, and that's it.
            
            #print name #(uncomment this line for debugging purposes i.e. to see that attribute was accessed/method was called)
            return self