rint64feather

64-bit integer support in feather


Can the feather package in R support 64-bit integers?

When the dataset is passed to feather::write_feather(), the column is converted to a 64-bit float, and loses precision. I'd like to avoid converting it to a character.

Here's a simplified example. In the real project, a database table (retrieved with the odbc package) has columns that are legit 64-bit integers (as specified in the bit64 package).

requireNamespace("bit64")

path <- base::tempfile(fileext = ".feather")

ds <-
  tibble::tibble(
    patient_id   = bit64::as.integer64(1:6)
  )
ds

# # A tibble: 6 x 1
#   patient_id
#   <int64>   
# 1 1         
# 2 2         
# 3 3         
# 4 4         
# 5 5         
# 6 6 

feather::write_feather(x = ds, path = path)

ds_read <- feather::read_feather(path)
# # A tibble: 6 x 1
#    patient_id
#         <dbl>
# 1 Inf.Nae-324
# 2 Inf.Nae-324
# 3   1.50e-323
# 4   2.00e-323
# 5   2.50e-323
# 6   3.00e-323


as.integer(ds_read$patient_id)
# Returns: [1] 0 0 0 0 0 0

unlink(path_out)

Note: I don't want to store them as floats, as suggested here.


Solution

  • It is actually "complicated". As you probably know, R itself has only two types: 32-bit integer and 64-bit double.

    So to represent 64-bit integers, Jens did quite some work in his bit64 package to use double as a "carrier" for the 64-bit payload and redefining all accessor functionality to treat it as as 64-bit (signed) integer. That works.

    Several packages support it natively, for example data.table. I took advantage of this when I created nanotime -- which uses 64-bit integers for nanoseconds since the epoch. This also works: we never convert to double in between and get faithful integer64 representation.

    I have also been following reticulate over the years, and it has very similar conversion issues from 64-bit integers (as those are native in Python) which are by now generally addressed.

    So long story short: your question is more of a feature request for feather. And as those involved focus now on arrow which appears to have 64-bit integer support, you most likely will just be asked to move to arrow. Or you could use data.table.