The (now-superseded) stand-alone feather library for R had a function called feather_metadata()
that allowed to read column names and types from feather files on disk, without opening them.
This was useful to select only specific columns when loading a feather file in R with read_feather(path, columns = c(...))
Now that the feather format is part of the arrow library, feather_metadata()
is not included anymore.
Is there an equivalent function in arrow to read column names and types of files on disk from R before loading them?
In the current version of the arrow R package, there is no direct replacement for feather::feather_metadata(path)
, but there are two workarounds that might work for you:
If you just need the column names (not the data types), you can do this:
rf <- arrow::ReadableFile$create(path)
fr <- arrow::FeatherReader$create(rf)
names(fr)
If you need the data types of the columns, you can try this:
arrow::read_feather(path, as_data_frame = FALSE)
That gives output like what you're looking for, and it should be pretty fast (because it does not convert the file to an R data frame) but it does read the full file (or at least it memory-maps the full file) so you might not want to do this if your Feather files are really large.