google-cloud-platformllama-index

Llama Index Google Docs Reader fails to read credentials.json file


I am trying to run the following notebook provided by Llama Index:

https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/data_connectors/GoogleDocsDemo.ipynb

I have acquired a credentials.json file with the following contents (I have modified it to remove sensitive information):

{"web":
  {"client_id":"...",
  "project_id":"...",
  "auth_uri":"https://accounts.google.com/o/oauth2/auth",
  "token_uri":"https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
  "client_secret":"GOFSOV-ny3...WQGUc",
  "redirect_uris":["http://localhost:8080/"]}
}

When I come to run this part of the notebook:

document_ids = ["1q165nYvEXTT8ym4pN9lf7aatF-_VYc2ziaIGUBxtkvQ"]
documents = GoogleDocsReader().load_data(document_ids=document_ids)`

I get the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-5-f0344b0dbcf4> in <cell line: 0>()
      1 # make sure credentials.json file exists
      2 document_ids = ["1q165nYvEXTT8ym4pN9lf7aatF-_VYc2ziaIGUBxtkvQ"]
----> 3 documents = GoogleDocsReader().load_data(document_ids=document_ids)

8 frames
/usr/local/lib/python3.11/dist-packages/llama_index/readers/google/docs/base.py in load_data(self, document_ids)
     70         results = []
     71         for document_id in document_ids:
---> 72             docs = self._load_doc(document_id)
     73             results.extend(docs)
     74 

/usr/local/lib/python3.11/dist-packages/llama_index/readers/google/docs/base.py in _load_doc(self, document_id)
     84             The document text.
     85         """
---> 86         credentials = self._get_credentials()
     87         docs_service = discovery.build("docs", "v1", credentials=credentials)
     88         google_doc = docs_service.documents().get(documentId=document_id).execute()

/usr/local/lib/python3.11/dist-packages/llama_index/readers/google/docs/base.py in _get_credentials(self)
    122                         port = redirect_uris[0].strip("/").split(":")[-1]
    123 
--> 124                 creds = flow.run_local_server(port=port)
    125             # Save the credentials for the next run
    126             with open("token.json", "w") as token:

/usr/local/lib/python3.11/dist-packages/google_auth_oauthlib/flow.py in run_local_server(self, host, bind_addr, port, authorization_prompt_message, success_message, open_browser, redirect_uri_trailing_slash, timeout_seconds, token_audience, browser, **kwargs)
    430         # Fail fast if the address is occupied
    431         wsgiref.simple_server.WSGIServer.allow_reuse_address = False
--> 432         local_server = wsgiref.simple_server.make_server(
    433             bind_addr or host, port, wsgi_app, handler_class=_WSGIRequestHandler
    434         )

/usr/lib/python3.11/wsgiref/simple_server.py in make_server(host, port, app, server_class, handler_class)
    152 ):
    153     """Create a new WSGI server listening on `host` and `port` for `app`"""
--> 154     server = server_class((host, port), handler_class)
    155     server.set_app(app)
    156     return server

/usr/lib/python3.11/socketserver.py in __init__(self, server_address, RequestHandlerClass, bind_and_activate)
    454         if bind_and_activate:
    455             try:
--> 456                 self.server_bind()
    457                 self.server_activate()
    458             except:

/usr/lib/python3.11/wsgiref/simple_server.py in server_bind(self)
     48     def server_bind(self):
     49         """Override server_bind to store the server name."""
---> 50         HTTPServer.server_bind(self)
     51         self.setup_environ()
     52 

/usr/lib/python3.11/http/server.py in server_bind(self)
    134     def server_bind(self):
    135         """Override server_bind to store the server name."""
--> 136         socketserver.TCPServer.server_bind(self)
    137         host, port = self.server_address[:2]
    138         self.server_name = socket.getfqdn(host)

/usr/lib/python3.11/socketserver.py in server_bind(self)
    470         if self.allow_reuse_port and hasattr(socket, "SO_REUSEPORT"):
    471             self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
--> 472         self.socket.bind(self.server_address)
    473         self.server_address = self.socket.getsockname()
    474 

TypeError: 'str' object cannot be interpreted as an integer

I have no idea why I'm getting this or what it means. Please provide some guidance.

I tried checking I'm using the correct format credentials file and that it contained the correct fields. I tried changing some field manually in the file but that didn't work either.


Solution

  • Looking at llama_index/readers/google/docs/base.py this looks like a simple bug. If redirect_uris is not an empty list, it will use a piece of string as the port number, which doesn't work.

    A workaround can be to use an empty list for redirect_uris. Even better if you send them a PR that adds int() around the extracted string.