I'm in the process of analyzing costs that are produced by the usage of Data Studio (now Looker Studio). For that I'm looking at the BigQuery audit logs.
I can find the query that Data Studio is firing, which may look something like this:
SELECT * FROM (
SELECT clmn10_, SUM(clmn26_) AS clmn100000_ FROM (
SELECT * FROM (
SELECT t0.customer AS clmn10_, t0.my_metric1 AS clmn26_, t0.my_dim2 AS clmn39_, t0.product AS clmn6_ FROM (
SELECT *
FROM `mytable`
WHERE date(_PARTITIONTIME) >= PARSE_DATE('%Y%m%d', @DS_START_DATE)
AND date(_PARTITIONTIME) <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
) t0
) WHERE ((clmn6_ = "value1") AND (clmn39_ = false))
) GROUP BY clmn10_
) LIMIT 20000000
Now that's a great start, but I would like to better understand the usage of @DS_START_DATE and @DS_END_DATE.
Strangely enough these variables show up in the logs like above, when I would have expected them to show the resolved values (i.e., the actual start and end dates used).
Is there a way to get the values of these variables for the queries?
Any hints are greatly appreciated.
You can extract the time range of the customized Looker Studio by
looking at the INFORMATION_SCHEMA.JOBS
and the job_stages
column. After several unnest operation, you can extract the where
condition. I always use i n Looker Studio the where date(column) between ... and ...
.
SELECT * except(substeps),
DATE_FROM_UNIX_DATE(safe_cast(split(substep,",")[safe_offset(1)] as int64)) as from_date,
DATE_FROM_UNIX_DATE(safe_cast(replace(split(substep,",")[safe_offset(2)],")","") as int64)) as to_date
from (
select
end_time,
stage.*,
array_to_string(substeps, " ") as query
from `region-eu`.INFORMATION_SCHEMA.JOBS, unnest(job_stages) as job_stage,unnest(job_stage.steps) as stage
where stage.kind="READ"
), unnest(substeps) as substep
where substep like "%date($%"
order by end_time desc