r

Internal representation of int NA


This is question about R internals. How integer NA values are represented in R? Unlike floating there is no magic bit sequence to represent NaNs.

# Create big array. newer versions of R won't allocate memory to store data
# Instead star/end values are stored internally
a <- 1:1e6 # 

# Change some random value. This will cause and array to be allocated
a[123] <- NA
typeof(a)

At this point a is still an array of integers. How a[123] represented internally? Does R use some magic number to indicate that an integer is NA?

My primary interest in internal representation of integers is related to binary read/write (readBin/writeBin). How to handle NA when performing binary I/O with external sources, e.g. via sockets?


Solution

  • R uses the minimum integer value to represent NA. On a 4-byte system, valid integer values are usually -2,147,483,648 to 2,147,483,647 but in R

    > .Machine$integer.max
    [1] 2147483647
    > -.Machine$integer.max
    [1] -2147483647
    > -.Machine$integer.max - 1L
    [1] NA
    Warning message:
    In -.Machine$integer.max - 1L : NAs produced by integer overflow
    

    Also,

    > .Internal(inspect(NA_integer_))
    @7fe69bbb79c0 13 INTSXP g0c1 [NAM(7)] (len=1, tl=0) -2147483648