I am running into an issue with BigQuery parameterization. I am passing a start date and end date to the function, along with an array of potential fields present in the database. Start and end date formatted as "yyyymmdd".
The goal is to be able to pass a set of dates and a set of fields and collect the data relating to the array of fields between the two dates.
Date operation works as intended.
The fields array is passed like so: ["user_pseudo_id", "event_name", "event_timestamp"] as an example (other entries in array possible)
Effectively, I want to further parameterize the query to look similar to the below, where the @search_params replaces individual variables in the SELECT portion of the query. The intent is for the fields array to be more scalable, from a single entry to multiple.
From my searching, I believe ArrayQueryParameter (in place of ScalarQueryParameter) would solve the issue, but I haven't found much documentation for the usage.
query_job = client.query("""
SELECT @search_params, _TABLE_SUFFIX AS suffix
FROM `analytics_180354243.events_*`
WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
BETWEEN @start_date AND @end_date
""", job_config=job_config)
FULL FUNCTION BELOW
def query_awe(start_date, end_date, fields):
credentials = service_account.Credentials.from_service_account_file('auth.json')
project_id = 'my-project-id'
client = bigquery.Client(credentials=credentials, project=project_id)
search_params = ""
for i in fields:
search_params += i + ", "
search_params = search_params[:-2]
query_params = [
bigquery.ScalarQueryParameter('start_date', 'STRING', start_date),
bigquery.ScalarQueryParameter('end_date', 'STRING', end_date),
bigquery.ScalarQueryParameter('search_params', 'STRING', search_params),
]
bigquery.ArrayQueryParameter
job_config = bigquery.QueryJobConfig()
job_config.use_legacy_sql = False
job_config.query_parameters = query_params
query_job = client.query("""
SELECT user_pseudo_id, event_name, _TABLE_SUFFIX AS suffix
FROM `analytics_180354243.events_*` #Each day saved as events_yyyymmdd
WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
BETWEEN @start_date AND @end_date
ORDER BY user_pseudo_id DESC
""", job_config=job_config)
results = query_job.result() # Waits for job to complete.
for row in results:
print(row)
What about just using string formatting?
def query_awe(start_date, end_date, fields):
credentials = service_account.Credentials.from_service_account_file('auth.json')
project_id = 'my-project-id'
client = bigquery.Client(credentials=credentials, project=project_id)
job_config = bigquery.QueryJobConfig()
job_config.use_legacy_sql = False
my_query = """
SELECT {0}, _TABLE_SUFFIX AS suffix
FROM `analytics_180354243.events_*` #Each day saved as events_yyyymmdd
WHERE REGEXP_EXTRACT(_TABLE_SUFFIX, r'[0-9]+')
BETWEEN {1} AND {2}
ORDER BY user_pseudo_id DESC
"""
my_query = my_query.format(', '.join(fields), start_date, end_date)
query_job = client.query(my_query, job_config=job_config)
results = query_job.result() # Waits for job to complete.
for row in results:
print(row)