ramazon-web-servicesamazon-dynamodbrjdbcrpython

Connecting to DynamoDB using R


I would like to connect to DynamoDB with R. My ultimate goal is to create a Shiny App to display data that is stored at DynamoDB and updated frequently. So I need an efficient way to retrieve it using R.

The following references give an intuition but they do not include a native implementation in R and have not been updated for a long time.

r language support for AWS DynamoDB
AWS dynamodb support for "R" programming language
R + httr and EC2 api authentication issues

As mentioned in the answers above, running Python within R through rPython would be an option as there are SDKs for Python such as boto3.

Another alternative would be using a JDBC driver through RJDBC, which I tried:

library(RJDBC)

drv <- JDBC(
  driverClass = "cdata.jdbc.dynamodb.DynamoDBDriver", 
  classPath = "MyInstallationDir\lib\cdata.jdbc.dynamodb.jar", 
  identifier.quote = "'"
)

conn <- dbConnect(
  drv,
  "Access Key=xxx;Secret Key=xxx;Domain=amazonaws.com;Region=OREGON;"
) 

(Access Key and Secret Key replaced by xxx) and I got the error:

Error in .verify.JDBC.result(jc, "Unable to connect JDBC to ", url) : 
  Unable to connect JDBC to Access Key=xxx;Secret 
Key=xxx;Domain=amazonaws.com;Region=OREGON;

What would be the best practice in this matter? Is there a working, native solution for R? I would appreciate if anyone could point me in the right direction.

Note: The package aws.dynamodb (https://github.com/cloudyr/aws.dynamodb) looks promising but the documentation lacks examples and I could not find any tutorial for it.


Solution

  • I would like to share some updates so that people with the same issue can benefit from this post:

    First, I figured out how to use the JDBC driver with a few tweaks:

    library(DBI)
    library(RJDBC)
    
    drv <- JDBC(
      driverClass = "cdata.jdbc.dynamodb.DynamoDBDriver", 
      classPath = "/Applications/CData/CData JDBC Driver for DynamoDB 2018/lib/cdata.jdbc.dynamodb.jar", 
      identifier.quote = "'"
    )
    
    conn <- dbConnect(
      drv, 
      url = 'jdbc:dynamodb: Access Key=xxx; SecretKey=xxx; Domain=amazonaws.com; Region=OREGON;'
    )
    
    dbListTables(conn)
    

    Second, I realized that reticulate makes it very convenient (even more than rPython) to run Python code inside R and ended up using reticulated boto3 to get data from DynamoDB into R. You can refer to the following documentations for additional info:
    reticulate
    boto3 - DynamoDB

    Last, I heard that RStudio is planning to build a NoSQL database driver (which would be compatible with DBI, dbplyr, pool etc.) but probably it won't be available sometime soon.

    Hope someone will create an R package as comprehensive as boto3 for AWS as it gets more and more popular.