google-cloud-functionsgoogle-cloud-dataflowgoogle-cloud-runpython-3.12

Dataflow call: httplib2 transport does not support per-request timeout. Set the timeout when constructing the httplib2.Http instance


I am working on updating python version to 3.12.0 in my cloud function. After upgrading, Dataflow call was outputting the following message:

httplib2 transport does not support per-request timeout. Set the timeout when constructing the httplib2.Http instance.

It is not erroring out but I would like to address the messaging.

Code using to call dataflow job to load a gcs file to BQ table:

from googleapiclient.discovery import build
def run_dataflow_template(project_id, gcsPath, job_name, parameters):
    '''
    run dataflow template
    '''
    dataflow = build('dataflow', 'v1b3', cache_discovery=False)

    request = dataflow.projects().templates().launch(
        projectId=project_id,
        gcsPath=gcsPath,
        body={'jobName': job_name, 'parameters': parameters, }
    )
    return request.execute()

I tried using

timeout_in_sec = 180
socket.setdefaulttimeout(timeout_in_sec)

but that didn't resolve the issue.


Solution

  • After talking to google support, I was able to resolve it.

    change requirements.txt to use an older version. latest version won't work.

    google-auth==2.14.1
    

    use google_auth_httplib2 for final authentication being instantiating the dataflow.

    from googleapiclient.discovery import build
    import httplib2
    from google.auth import default as get_default_credentials
    from google_auth_httplib2 import AuthorizedHttp
    
    def run_dataflow_template(project_id, gcsPath, job_name, parameters):
        '''
        run dataflow template
        '''
        # http timeout and credential
        credentials, _ = get_default_credentials()
        http_with_timeout = httplib2.Http(timeout=3600)
        authed_http = AuthorizedHttp(credentials, http=http_with_timeout)
    
        # use the http in dataflow
        dataflow = build('dataflow', 'v1b3', cache_discovery=False, http=authed_http)
        
        request = dataflow.projects().templates().launch(
            projectId=project_id,
            gcsPath=gcsPath,
            body={'jobName': job_name, 'parameters': parameters, }
        )
        return request.execute()