How do I get a simple 'bigrquery' function using public data to work? I just want to get ANY 'bigrquery' function to return data from Google Cloud Platform Big Query.
I started by looking at this example from bigrquery as a starting point (I already have the package installed):
library(bigrquery)
billing <- bq_test_project() # replace this with your project ID
sql <- "SELECT year, month, day, weight_pounds FROM `publicdata.samples.natality`"
tb <- bq_project_query(billing, sql)
bq_table_download(tb, n_max = 10)
#> First chunk includes all requested rows.
#> # A tibble: 10 x 4
#> year month day weight_pounds
#> <int> <int> <int> <dbl>
#> 1 1969 1 20 7.00
#> 2 1969 1 27 7.69
#> 3 1969 6 19 6.75
#> 4 1969 5 30 6.19
#> 5 1969 11 9 7.87
#> 6 1969 5 25 7.06
#> 7 1969 7 25 7.94
#> 8 1969 9 11 7.06
#> 9 1969 7 13 6.00
#> 10 1969 9 27 8.13
I know that billing
needs to be a project ID associated with my account on GCP, so I navigated per the instructions under "Billing project" from one of my own test projects to get project ID:
So here's what I did:
library(bigrquery)
bq_auth(email = "my-google-account@gmail.com") # Not a step listed,
# but I saw this elsewhere and thought it might be necessary.
billing <- "tidy-XXXXX" # my project ID from the screenshot above
sql <- "SELECT year, month, day, weight_pounds FROM `publicdata.samples.natality`"
tb <- bq_project_query(billing, sql) # Cannot get past this point without Error
bq_table_download(tb, n_max = 10)
Running bq_auth()
caused my browser to activate and have me verify myself in the browser once, but not on subsequent runs (which is the expected behavior, I think).
Running bq_project_query()
gets me this:
> tb <- bq_project_query(billing, sql)
Error in `signal_reason()`:
! Request had insufficient authentication scopes. [insufficientPermissions]
Run `rlang::last_error()` to see where the error occurred.
> last_error()
<error/bigrquery_insufficientPermissions>
Error in `signal_reason()`:
! Request had insufficient authentication scopes. [insufficientPermissions]
---
Backtrace:
1. bigrquery::bq_project_query(billing, sql)
2. bigrquery::bq_perform_query(...)
3. bigrquery:::bq_post(url, body = bq_body(body, ...), query = list(fields = "jobReference"))
4. bigrquery:::process_request(req)
5. bigrquery:::bq_check_response(status, type, content)
6. bigrquery:::signal_reason(json$error$errors[[1L]]$reason, json$error$message)
The SQL query is good as it works in the GCP console. Under "Authentication and authorization" I see a reference to cache inside a folder ~/.R/gargle/gargle-oauth/
, but I'm on Windows and not finding that file/folder, but under "Details", it seems I shouldn't need to worry about email or bq_auth()
at all. I don't have or don't think I need "API credentials" because I'm not writing an application -- I'm just using this in RStudio interactively (well, .R scripts -- but that's the same, right?). I'm not trying to use a service account (should I? I thought my user account would be sufficient).
I'd like to see what is happening in the backtrace in bigrquery:::bq_check_response(status, type, content)
, but I don't know how to step into that particular code (I can step through code I write myself -- but not a package function?)
I actually have a work use 'bigrquery', but unless I can get this figured out, I'll never get that working. I feel like I'm missing something obvious, because I don't see this issue popping up in the SO questions that I've already searched.
Maybe progress? I feel that this token should not be NULL
below, but still not sure how to fix it. First, I
library(bigrquery)
options(error = recover) # ADDED TO ALLOW BROWSING OF CALL FRAMES
bq_auth(email = "my-google-account@gmail.com")
billing <- "tidy-XXXXXX"
# Error comes on next line:
tb <- bq_project_query(x = billing,
"SELECT count(*) FROM publicdata.samples.natality"
)
The error I get is the same as earlier in the post, but options(error = recover)
lets me access the frames:
> library(bigrquery)
> options(error = recover)
> bq_auth(email = "my-google-account@gmail.com")
> billing <- "tidy-XXXXXX"
> tb <- bq_project_query(
+ x = billing,
+ "SELECT count(*) FROM publicdata.samples.natality"
+ )
Error in `signal_reason()`:
! Request had insufficient authentication scopes. [insufficientPermissions]
Run `rlang::last_error()` to see where the error occurred.
Enter a frame number, or 0 to exit
1: source("C:/SO_QuestionScript.R")
2: withVisible(eval(ei, envir))
3: eval(ei, envir)
4: eval(ei, envir)
5: GoogleCloudPlatformQuery_Stack_Overflow_Question.R#9: bq_project_query(x = billing, "SELECT count(*) FROM publicdata.samples.natality")
6: bq_perform_query(query, billing = x, destination_table = destination_table, ...)
7: bq_post(url, body = bq_body(body, ...), query = list(fields = "jobReference"))
8: process_request(req)
9: bq_check_response(status, type, content)
10: signal_reason(json$error$errors[[1]]$reason, json$error$message)
11: rlang::abort(message, class = paste0("bigrquery_", reason))
12: signal_abort(cnd, .file)
Selection: 7
Called from: eval(substitute(browser(skipCalls = skip), list(skip = 7 - which)),
envir = sys.frame(which))
Browse[1]> dput(token)
structure(list(method = NULL, url = NULL, headers = NULL, fields = NULL,
options = list(), auth_token = <environment>, output = NULL), class = "request")
Browse[1]>
If I dig a little more on the internals of bq_post()
, I can see that the req
variable is giving me the status code 403, which is my problem:
Browse[1]> bq_post
function (url, body, ..., query = NULL, token = bq_token())
{
json <- jsonlite::toJSON(body, pretty = TRUE, auto_unbox = TRUE)
req <- POST(paste0(base_url, url), body = json, httr::user_agent(bq_ua()),
token, add_headers(`Content-Type` = "application/json"),
..., query = prepare_bq_query(query))
invisible(process_request(req))
}
Browse[1]> req$status_code
[1] 403
Browse[1]> req$request
<request>
POST https://www.googleapis.com/bigquery/v2/projects/tidy-orbit-173301/jobs/?fields=jobReference
Output: write_memory
Options:
* post: TRUE
* postfieldsize: 177
* postfields: as.raw(c(0x7b, 0x0a, 0x20, 0x20, 0x22, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x75, 0x72, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x22, 0x3a, 0x20, 0x7b, 0x0a, 0x20, 0x20, 0x20, 0x20, 0x22, 0x71, 0x75, 0x65, 0x72, 0x79, 0x22, 0x3a, 0x20, 0x7b, 0x0a, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x22, 0x71, 0x75, 0x65, 0x72, 0x79, 0x22, 0x3a, 0x20, 0x22, 0x53, 0x45, 0x4c, 0x45, 0x43, 0x54, 0x20, 0x63, 0x6f, 0x75, 0x6e, 0x74, 0x28, 0x2a, 0x29, 0x20, 0x46, 0x52, 0x4f, 0x4d, 0x20, 0x70, 0x75, 0x62, 0x6c, 0x69, 0x63, 0x64,
0x61, 0x74, 0x61, 0x2e, 0x73, 0x61, 0x6d, 0x70, 0x6c, 0x65, 0x73, 0x2e, 0x6e, 0x61, 0x74, 0x61, 0x6c, 0x69, 0x74, 0x79, 0x22, 0x2c, 0x0a, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x22, 0x75, 0x73, 0x65, 0x4c, 0x65, 0x67, 0x61, 0x63, 0x79, 0x53, 0x71, 0x6c, 0x22, 0x3a, 0x20, 0x66, 0x61, 0x6c, 0x73, 0x65, 0x2c, 0x0a, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x22, 0x70, 0x72, 0x69, 0x6f, 0x72, 0x69, 0x74, 0x79, 0x22, 0x3a, 0x20, 0x22, 0x49, 0x4e, 0x54, 0x45, 0x52, 0x41, 0x43, 0x54, 0x49, 0x56, 0x45, 0x22, 0x0a,
0x20, 0x20, 0x20, 0x20, 0x7d, 0x0a, 0x20, 0x20, 0x7d, 0x0a, 0x7d))
* useragent: bigrquery/1.4.0 (GPN:RStudio; ) gargle/1.2.0 httr/1.4.2
Headers:
* Accept: application/json, text/xml, application/xml, */*
* Content-Type: application/json
* Authorization: Bearer ya29.a0ARrdaM_Z0AIQ7950HcZ87goJJd8NorjH8m5JJ3BTTjj3hwAxH_PrXlTfWKjbYLXx-OfCiIk2BwAtN5trJh7FHf--[MORE STRING YOU DON'T NEED]
Auth token: Gargle2.0
The Authorization: Bearer
looks like I have a token, but maybe it's not set up correctly?
Okay, I figured out that I didn't properly authorize Google during the initial authentication when it goes from RStudio to the browser and then back.
You MUST check these two boxes which I blew off originally:
In R, you'll need to bq_deauth()
your existing credentials, and when you run another BigRQuery function, it will again ask you if you want to use stored credentials or create new ones by entering '0'. Choose '0'. So it should look like this:
> bq_deauth()
> con <- dbConnect(
+ bigrquery::bigquery(),
+ project = "bigquery-public-data",
+ dataset = "baseball",
+ billing = billing
+ )
> dbListTables(con)
The bigrquery package is requesting access to your Google account.
Select a pre-authorised account or enter '0' to obtain a new token.
Press Esc/Ctrl + C to cancel.
1: scott.jackson@workemail.com
2: scott.jackson@gmail.com
Selection: 0
Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Authentication complete.
[1] "games_post_wide" "games_wide" "schedules" # OUTPUT I WANTED!!!!!!
After you select '0', the browser has you re-authenticate, and Bob's your uncle!