I am planning to have a Cloud Scheduler that calls a GCP Workflows every day at 8 a.m. My GCP Workflows will have around 15 different steps
and will be only transformations (update, delete, add) on BigQuery. Some queries will be quite long and I am wondering if there is a way to load a .sql
file into a GCP Workflows task1.yaml
?
#workflow entrypoint
ProcessItem:
params: [project, gcsPath]
steps:
- initialize:
assign:
- dataset: wf_samples
- input: ${gcsPath}
- sqlQuery: QUERY HERE
...
You need to do something similar: (of course you can assign this to a variable like input
)
#workflow entrypoint
main:
steps:
- getSqlfile:
call: http.get
args:
url: https://raw.githubusercontent.com/jisaw/sqlzoo-solutions/master/select-in-select.sql
headers:
Content-Type: "text/plain"
result: queryFromFile
- final:
return: ${queryFromFile.body}
For Cloud Storage that may look like:
call: http.get
args:
url: https://storage.cloud.google.com/................./q1.sql
headers:
Content-Type: "text/plain"
auth:
type: OIDC
result: queryFromFile
Or event with this format (different URL syntax + OAuth2)
call: http.get
args:
url: https://storage.googleapis.com/................./q1.sql
headers:
Content-Type: "text/plain"
auth:
type: OAuth2
result: queryFromFile
Make sure that invoker has the right permission to access the Cloud Storage file.
Note: On further testing, this to work correctly the
text/plain
mime-type must be set on the GCS file.