floating-pointlanguage-agnosticnanieee-754

Why is NaN not equal to NaN?


The relevant IEEE standard defines a numeric constant NaN (not a number) and prescribes that NaN should compare as not equal to itself. Why is that?

All the languages I'm familiar with implement this rule. But it often causes significant problems, for example unexpected behavior when NaN is stored in a container, when NaN is in the data that is being sorted, etc. Not to mention, the vast majority of programmers expect any object to be equal to itself (before they learn about NaN), so surprising them adds to the bugs and confusion.

IEEE standards are well thought out, so I am sure there is a good reason why NaN comparing as equal to itself would be bad. I just can't figure out what it is.

Edit: please refer to What is the rationale for all comparisons returning false for IEEE754 NaN values? as the authoritative answer.


Solution

  • My original answer (from 4 years ago) criticizes the decision from the modern-day perspective without understanding the context in which the decision was made. As such, it doesn't answer the question.

    The correct answer is given here:

    NaN != NaN originated out of two pragmatic considerations:

    [...] There was no isnan( ) predicate at the time that NaN was formalized in the 8087 arithmetic; it was necessary to provide programmers with a convenient and efficient means of detecting NaN values that didn’t depend on programming languages providing something like isnan( ) which could take many years

    There was one disadvantage to that approach: it made NaN less useful in many situations unrelated to numerical computation. For example, much later when people wanted to use NaN to represent missing values and put them in hash-based containers, they couldn't do it.

    If the committee foresaw future use cases, and considered them important enough, they could have gone for the more verbose !(x<x & x>x) instead of x!=x as a test for NaN. However, their focus was more pragmatic and narrow: providing the best solution for a numeric computation, and as such they saw no issue with their approach.

    ===

    Original answer:

    I am sorry, much as I appreciate the thought that went into the top-voted answer, I disagree with it. NaN does not mean "undefined" - see http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF, page 7 (search for the word "undefined"). As that document confirms, NaN is a well-defined concept.

    Furthermore, IEEE approach was to follow the regular mathematics rules as much as possible, and when they couldn't, follow the rule of "least surprise" - see https://stackoverflow.com/a/1573715/336527. Any mathematical object is equal to itself, so the rules of mathematics would imply that NaN == NaN should be True. I cannot see any valid and powerful reason to deviate from such a major mathematical principle (not to mention the less important rules of trichotomy of comparison, etc.).

    As a result, my conclusion is as follows.

    IEEE committee members did not think this through very clearly, and made a mistake. Since very few people understood the IEEE committee approach, or cared about what exactly the standard says about NaN (to wit: most compilers' treatment of NaN violates the IEEE standard anyway), nobody raised an alarm. Hence, this mistake is now embedded in the standard. It is unlikely to be fixed, since such a fix would break a lot of existing code.

    Edit: Here is one post from a very informative discussion. Note: to get an unbiased view you have to read the entire thread, as Guido takes a different view to that of some other core developers. However, Guido is not personally interested in this topic, and largely follows Tim Peters recommendation. If anyone has Tim Peters' arguments in favor of NaN != NaN, please add them in comments; they have a good chance to change my opinion.