pythongoogle-api-clienthttplib2youtube-data-api

Google API client (Python): is it possible to use BatchHttpRequest with ETag caching


I'm using YouTube data API v3.

Is it possible to make a big BatchHttpRequest (e.g., see here) and also to use ETags for local caching at the httplib2 level (e.g., see here)?

ETags work fine for single queries, I don't understand if they are useful also for batch requests.


Solution

  • TL;DR:

    HERE IT IS:

    First lets see the way to initialize BatchHttpRequest:

    from apiclient.http import BatchHttpRequest
    
    def list_animals(request_id, response, exception):
      if exception is not None:
        # Do something with the exception
        pass
      else:
        # Do something with the response
        pass
    
    def list_farmers(request_id, response):
      """Do something with the farmers list response."""
      pass
    
    service = build('farm', 'v2')
    
    batch = service.new_batch_http_request()
    
    batch.add(service.animals().list(), callback=list_animals)
    batch.add(service.farmers().list(), callback=list_farmers)
    
    
    batch.execute(http=http)
    

    Second lets see how ETags are used:

    from google.appengine.api import memcache
    http = httplib2.Http(cache=memcache)
    

    Now lets analyze:

    Observe the last line of BatchHttpRequest example: batch.execute(http=http), and now checking the source code for execute, it calls _refresh_and_apply_credentials, which applies the http object we pass it.

    def _refresh_and_apply_credentials(self, request, http):
        """Refresh the credentials and apply to the request.
        Args:
          request: HttpRequest, the request.
          http: httplib2.Http, the global http object for the batch.
        """
        # For the credentials to refresh, but only once per refresh_token
        # If there is no http per the request then refresh the http passed in
        # via execute()
    

    Which means, execute call which takes in http, can be passed the ETag http you would have created as:

    http = httplib2.Http(cache=memcache)
    # This would mean we would get the ETags cached http
    batch.execute(http=http)
    

    Update 1:

    Could try with a custom object as well:

    from googleapiclient.discovery_cache import DISCOVERY_DOC_MAX_AGE
    from googleapiclient.discovery_cache.base import Cache
    from googleapiclient.discovery_cache.file_cache import Cache as FileCache
    
    custCache = FileCache(max_age=DISCOVERY_DOC_MAX_AGE)
    http = httplib2.Http(cache=custCache)
    # This would mean we would get the ETags cached http
    batch.execute(http=http)
    

    Because, this is just a hunch on the comment in http2 lib:

    """If 'cache' is a string then it is used as a directory name for
            a disk cache. Otherwise it must be an object that supports the
            same interface as FileCache.
    

    Conclusion Update 2:

    After again verifying the google-api-python source code, I see that, BatchHttpRequest is fixed with 'POST' request and has a content-type of multipart/mixed;.. - source code.

    Giving a clue about the fact that, BatchHttpRequest is useful in order to POST data which is then processed down the later.

    Now, keeping that in mind, observing what httplib2 request method uses: _updateCache only when following criteria are met:

    1. Request is in ["GET", "HEAD"] or response.status == 303 or is a redirect request
    2. ElSE -- response.status in [200, 203] and method in ["GET", "HEAD"]
    3. OR -- if response.status == 304 and method == "GET"

    This means, BatchHttpRequest cannot be used with caching.