rnanna

Difference between NA_real_ and NaN


When I use .Internal(inspect()) to NA_real_ and NaN, it returns,

> .Internal(inspect(NA_real_))
@0x000001e79724d0e0 14 REALSXP g0c1 [REF(2)] (len=1, tl=0) nan
> .Internal(inspect(NaN))
@0x000001e797264a88 14 REALSXP g0c1 [REF(2)] (len=1, tl=0) nan

It seems like their only difference is the memory address.

However, when I coerce the NA_real_ and NaN into character, it returns,

> as.character(c(NaN, NA_real_))
[1] "NaN" NA

I understand that it should return the above result as NaN can't be character and it will be coerced into "NaN" but NA_real will be coerced into NA_character_. But considering their gut is same, how can R returns different results for them?

Thank you in advance for any suggestions!


Solution

  • Well. First off, remember that NA is an R concept that has no equivalent in C. So, by necessity, NA needs to be represented differently in C. The fact that .Internal(inspect()) does not make this distinction doesn’t mean it isn’t made elsewhere. In fact, it so happens that .Internal(inspect()) uses Rprintf to print the value’s internal double floating point representation. And, indeed, R NAs are encoded as an NaN value in a C floating point type.

    Secondly, you observe that “their only difference is the memory address.” — So what? At least conceptually, distinct memory addresses are fully sufficient to distinguish NA and NaN, nothing more is required.

    But as a matter of fact R distinguishes these values by a different route. This is possible because the IEEE 754 double precision floating point format has multiple different representations of NaN, and R reserves a specific one for NAs:

    static double R_ValueOfNA(void)
    {
        /* The gcc shipping with Fedora 9 gets this wrong without
         * the volatile declaration. Thanks to Marc Schwartz. */
        volatile ieee_double x;
        x.word[hw] = 0x7ff00000;
        x.word[lw] = 1954;
        return x.value;
    }
    

    Where:

    typedef union
    {
        double value;
        unsigned int word[2];
    } ieee_double;
    

    And hw and lw have the values 0 and 1, respectively (which has which value depends on platform endianness).

    And, furthermore:

    /* is a value known to be a NaN also an R NA? */
    int attribute_hidden R_NaN_is_R_NA(double x)
    {
        ieee_double y;
        y.value = x;
        return (y.word[lw] == 1954);
    }
    
    int R_IsNA(double x)
    {
        return isnan(x) && R_NaN_is_R_NA(x);
    }
    
    int R_IsNaN(double x)
    {
        return isnan(x) && ! R_NaN_is_R_NA(x);
    }
    

    (src/main/arithmetic.c)