rsnappyapache-arrow

R arrow: Error: Support for codec 'snappy' not built


I have been using the latest R arrow package (arrow_2.0.0.20201106) that supports reading and writing from AWS S3 directly (which is awesome).

I don't seem to have issues when I write and read my own file (see below):

write_parquet(iris, "iris.parquet")
system("aws s3 mv iris.parquet s3://myawsbucket/iris.parquet")
df <- read_parquet("s3://myawsbucket/iris.parquet")

But when I try to read in one of the sample R arrow files, I get the following error:

df <- read_parquet("s3://ursa-labs-taxi-data/2019/06/data.parquet")
Error in parquet___arrow___FileReader__ReadTable1(self) : 
  IOError: NotImplemented: Support for codec 'snappy' not built

When I check if the codec is available, it looks like it is not:

codec_is_available(type="snappy")
[1] FALSE

Anyone know a way to make the "snappy" codec available?

Thanks, Mike

###########

Follow up

Thanks to the answer from @Neal below. Here is the code that installed all needed dependencies for me.

Sys.setenv(ARROW_S3="ON")
Sys.setenv(NOT_CRAN="true")
install.packages("arrow", repos = "https://arrow-r-nightly.s3.amazonaws.com")

Solution

  • I'm assuming you're on Linux since the macOS and Windows binary packages have snappy support--that right? Usually if you've installed the Linux package with S3 support you've also built all of the compression libraries, but it is possible to build S3 without the compression libs. How exactly did you install the package?

    https://arrow.apache.org/docs/r/articles/install.html may be a useful reference.

    Side note: you can just write_parquet(iris, "s3://myawsbucket/iris.parquet"), no need to write to a local file and shell out to copy it to S3.