rfloating-pointnainternal-representation

How does R represent NA internally?


R seems to support an efficient NA value in floating point arrays. How does it represent it internally?

My (perhaps flawed) understanding is that modern CPUs can carry out floating point calculations in hardware, including efficient handling of Inf, -Inf and NaN values. How does NA fit into this, and how is it implemented without compromising performance?


Solution

  • With IEEE floats +Inf and -Inf is represented with all bits in the exponent (second till 13. bit) set to one and all bits in the mantissa set to zero, whereas NaN has a non-zero mantissa. R uses different values for the mantissa to represent NaN as well as NA_real_. We can use a simple C++ function to make this explicit:

    Rcpp::cppFunction('void print_hex(double x) {
        uint64_t y;
        static_assert(sizeof x == sizeof y, "Size does not match!");
        std::memcpy(&y, &x, sizeof y);
        Rcpp::Rcout << std::hex << y << std::endl;
    }', plugins = "cpp11", includes = "#include <cstdint>")
    print_hex(NA_real_)
    #> 7ff00000000007a2
    print_hex(NaN)
    #> 7ff8000000000000
    print_hex(Inf)
    #> 7ff0000000000000
    print_hex(-Inf)
    #> fff0000000000000
    

    Here some source code references.