I am working on updating python version to 3.12.0 in my cloud function. After upgrading, Dataflow call was outputting the following message:
httplib2 transport does not support per-request timeout. Set the timeout when constructing the httplib2.Http instance.
It is not erroring out but I would like to address the messaging.
Code using to call dataflow job to load a gcs file to BQ table:
from googleapiclient.discovery import build
def run_dataflow_template(project_id, gcsPath, job_name, parameters):
'''
run dataflow template
'''
dataflow = build('dataflow', 'v1b3', cache_discovery=False)
request = dataflow.projects().templates().launch(
projectId=project_id,
gcsPath=gcsPath,
body={'jobName': job_name, 'parameters': parameters, }
)
return request.execute()
I tried using
timeout_in_sec = 180
socket.setdefaulttimeout(timeout_in_sec)
but that didn't resolve the issue.
After talking to google support, I was able to resolve it.
change requirements.txt to use an older version. latest version won't work.
google-auth==2.14.1
use google_auth_httplib2 for final authentication being instantiating the dataflow.
from googleapiclient.discovery import build
import httplib2
from google.auth import default as get_default_credentials
from google_auth_httplib2 import AuthorizedHttp
def run_dataflow_template(project_id, gcsPath, job_name, parameters):
'''
run dataflow template
'''
# http timeout and credential
credentials, _ = get_default_credentials()
http_with_timeout = httplib2.Http(timeout=3600)
authed_http = AuthorizedHttp(credentials, http=http_with_timeout)
# use the http in dataflow
dataflow = build('dataflow', 'v1b3', cache_discovery=False, http=authed_http)
request = dataflow.projects().templates().launch(
projectId=project_id,
gcsPath=gcsPath,
body={'jobName': job_name, 'parameters': parameters, }
)
return request.execute()