I am trying to download a BigQuery data set from Google Cloud Platform into R workspace in order to analyze it, using the following code:
library(bigrquery)
library(DBI)
library(tidyverse)
library(dplyr)
con = dbConnect(
bigquery(),
project = "bigquery-public-data",
dataset = "new_york_citibike",
billing = "maanan-bigquery-in-r"
)
bigrquery::bq_auth()
my_db_pointer = tbl(con, "citibike_trips")
glimpse(my_db_pointer)
count(my_db_pointer)
selected = select(my_db_pointer, everything()) %>% collect()
However, when I try to run the last line in order to download the data, it returns the following error:
Complete
Billed: 0 B
Downloading first chunk of data.
Received 55,308 rows in the first chunk.
Downloading the remaining 58,882,407 rows in 1420 chunks of (up to) 41,481 rows.
Downloading data [=====>--------------------------------------------------------------------------------------------------] 6% ETA: 19m
Error in `signal_reason()`:
! Exceeded rate limits: Your project:453562790213 exceeded quota for tabledata.list bytes per second per project. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors [rateLimitExceeded]
ℹ Try increasing the `page_size` value of `bq_table_download()`
Run `rlang::last_error()` to see where the error occurred.
I would be very grateful if someone can help me fix this error and download the data? I need to analyze the data set. Thank you in advance.
As per documentation link about rateLimitExceeded, it looks like you break the threshold of query jobs.
Please consider the following:
Check if your project bigquery api
have setup limits and quotas that you might be breaking when performing the operation. To see your current quotas and limits please go to IAM & Admin > Quotas > Quotas for project "projectid" > bigquery.google.apis.com
As your chunks are about 55,308 rows
per chunk of 58,882,407 rows
it appears you are trying to download way more data that it allows and you might be hitting the following limits: Query/script execution-time limit
, Maximum response size
, Maximum row size
.
Verify if table constraints are not reached. Specially the one about operations per day
.
Check the amount of columns you row have. There is a limit of 10,000 columns.
Consider checking all the rest of quota limits specified on query jobs.
Reducing the scope of your select or reduce the size of your chunks. Million records tables of everything its truly needed?. You can perform something like this:
library(bigrquery)
# authenticate
# use if notebook is outside gcp
#bigrquery::bq_auth(path = '/Users/me/restofthepath/bigquery- credentials.json')
bq_table_download("my-project-id.dataset-id.table", page_size = 100)
For additionals details about this function, check bq_table_download