We are experiencing timeout issue when connecting to Google Sheets, specifically using googleapiclient. The code has been working, but after some new deployment, we start getting this error. Even we roll back the changes, this error still persists.
We setup airflow running on MWAA Airflow 2.6.3, and build dependencies with python WHL file. We tried installing requirements from Python Package Index but it got timeout error WARNING: requirements.txt installation timed out after 9 minutes. Some requirements may not have installed.
and DAGs are broken.
Airflow is able to connect to other 3rd party services (Jira, other services, etc.), but DAGs connecting to Google Sheet API are having issues.
Please share any solution or possible place we can look to resolve the issue. Thanks.
Code Snippet
from googleapiclient.discovery import build
service = getattr(build(
serviceName='sheets',
version='v4',
credentials=<credentials>), spreadsheets)()
service.get(spreadsheetId=<spreadsheet_id>).execute()
And we get following stack trace
Traceback (most recent call last):
File "/usr/local/airflow/dags/common/spreadsheet.py", line 199, in get_spreadsheet
return service.get(spreadsheetId=self._id).execute()
File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 923, in execute
resp, content = _retry_request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 222, in _retry_request
raise exception
File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 191, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 209, in request
self.credentials.before_request(self._request, method, uri, request_headers)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/auth/credentials.py", line 151, in before_request
self.refresh(request)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/service_account.py", line 434, in refresh
access_token, expiry, _ = _client.jwt_grant(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 312, in jwt_grant
response_data = _token_endpoint_request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 272, in _token_endpoint_request
response_status_ok, response_data, retryable_error = _token_endpoint_request_no_throw(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 219, in _token_endpoint_request_no_throw
request_succeeded, response_data, retryable_error = _perform_request()
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 195, in _perform_request
response = request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 119, in __call__
response, data = self.http.request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1724, in request
(response, content) = self._request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1444, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1366, in _conn_request
conn.connect()
File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1156, in connect
sock.connect((self.host, self.port))
TimeoutError: timed out
Configurations:
MWAA: Airflow 2.6.3
Installed Packages (Using plugins.zip):
- Levenshtein-0.21.1
- PyGithub-1.59.0
- adtk-0.6.2
- apache-airflow-providers-atlassian-jira-2.1.1
- apache-airflow-providers-github-2.3.1
- apache-airflow-providers-mysql-5.1.1
- apache-airflow-providers-snowflake-4.2.0
- asttokens-2.2.1
- atlassian-python-api-3.39.0
- aws-requests-auth-0.4.3
- backcall-0.2.0
- cachetools-5.3.1
- comm-0.2.2
- cycler-0.12.1
- debugpy-1.8.1
- defusedxml-0.7.1
- executing-1.2.0
- fonttools-4.50.0
- google-api-core-2.11.0
- google-api-python-client-2.92.0
- google-auth-2.21.0
- google-auth-httplib2-0.1.0
- googleapis-common-protos-1.59.1
- gql-3.3.0
- graphql-core-3.2.3
- httplib2-0.22.0
- iniconfig-2.0.0
- ipykernel-6.25.1
- ipython-8.14.0
- jedi-0.18.2
- jira-3.5.2
- joblib-1.3.2
- jupyter-client-8.3.0
- jupyter-core-5.3.1
- kiwisolver-1.4.5
- matplotlib-3.5.2
- matplotlib-inline-0.1.6
- mpld3-0.5.9
- mysqlclient-2.2.0
- nest-asyncio-1.6.0
- numpy-1.24.4
- oauthlib-3.2.2
- oscrypto-1.3.0
- pandas-1.5.3
- parso-0.8.3
- patsy-0.5.6
- pickleshare-0.7.5
- pillow-10.2.0
- playwright-1.37.0
- protobuf-4.23.4
- pure-eval-0.2.2
- py-1.11.0
- pyOpenSSL-23.2.0
- pyasn1-0.4.8
- pyasn1-modules-0.2.8
- pycryptodomex-3.18.0
- pyee-9.0.4
- pynacl-1.5.0
- pypika-0.48.9
- pytest-7.4.0
- python-Levenshtein-0.21.1
- pyzmq-25.1.0
- requests-oauthlib-1.3.1
- retry-0.9.2
- rsa-4.9
- scikit-learn-1.3.0
- scipy-1.12.0
- snowflake-connector-python-3.0.4
- snowflake-sqlalchemy-1.4.7
- sortedcontainers-2.4.0
- sql-formatter-0.6.2
- stack-data-0.6.2
- statsmodels-0.14.1
- thefuzz-0.20.0
- threadpoolctl-3.4.0
- traitlets-5.9.0
- uritemplate-4.1.1
For anyone came here.
After lots of try-and-error, eventually we found the issue with IPv6 on network interacting with the Google API packages (per this answer https://stackoverflow.com/a/75375184/15938510) We removed the IPv6 on the AWS network, and now the code is working normally.