We currently use a service account json key to access BigQuery from RStudio non-interactively. But as you may know, there are inherent security issues that come with this method i.e. Anyone with the key on any machine can access BigQuery using this method. So I was wondering, is there a way we can access BigQuery data in RStudio only if it is coming from a particular compute engine instance in GCP? Does google offer a way to auth via compute engine without any keys? If so, how can I use the same via RStudio non-interactively?
Update 20220105 :
You can simply set the below option at the beginning of the script or via .Rprofile to automatically set this up on startup
options(gargle.gce.timeout = 2)
Note - In case of Ubuntu, full path of Rprofile.site is /usr/lib/R/etc/Rprofile.site
##########################Old Content#########################
Update - I was able to perform keyless auth using application direct credentials as DazWilkin had pointed out. But doing so was not straight forward. bigrquery's bq_auth() function uses the gargle package for authentication. gargle has a function called credentials_gce() to auth using service accounts. In theory, when in a compute engine whose service account has access to BQ, bq_auth() would call credentials_gce() and auth should just work seamlessly, without any additional parameters. But in practice, as of v1.2.0 of gargle, I've found a bug. gargle has a default timeout of 0.8 set for getting temporary token from google's metadata server, which is too low such that it times out even before it can receive a legitimate request from google's internal metadata service. As of now, I've found that adding below lines before bq_auth() solves the problem. Infact, you don't need to write bq_auth() separately as bigrquery will automatically run bq_auth() the moment it finds an operation that requires authentication, but I've found that in some cases when combined with googleCloudStorageR package, bq_auth() doesn't work as expected so I've added it separately.
library(gargle)
library(bigrquery)
invisible(
try({
httr::with_config(httr::timeout(getOption("gargle.gce.timeout", default = 2)), {
httr::GET('http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token', httr::add_headers("Metadata-Flavor" = "Google"))
})
},
silent = TRUE)
)
bq_auth()
As an additional note, if someone is looking on how to do the same with googleCloudStorageR, the R package to interact with Google Cloud Storage, the code goes as follows. Note that googleCloudStorageR uses the googleAuthR package for authentication. Calling gcs_auth() separately won't be needed.
library(googleAuthR)
invisible(
try({
httr::with_config(httr::timeout(getOption("gargle.gce.timeout", default = 2)), {
httr::GET('http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token', httr::add_headers("Metadata-Flavor" = "Google"))
})
},
silent = TRUE)
)
gar_gce_auth(
scopes = "https://www.googleapis.com/auth/bigquery",
service_account = "default"
)