google-cloud-platformopenid-connectgcloudcirclecigsutil

Issue with Work Identity Federation (WIF) and gsutil ACL set Command


I've an existing setup in which I'm using workload identity federation (WIF) to authenticate circleCI with GCP and everything has been working perfectly fine. Its a simple workflow which uses gsutil -m rsync -d -r folder/ gs://bucket command to sync a folder with a GCS bucket.

I recently modified my workflow to also run following additional command right after rsync which is basically intended to mark all objects in the bucket as public. I know I can instead mark the whole bucket as public but without going into much details there is a specific reason I'm doing it this way.

gsutil -m acl set -R -a public-read gs://bucket

after making above change, I see following error in circleCI when this command is executed:

Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
    gsutil.RunMain()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil.py", line 151, in RunMain
    sys.exit(gslib.__main__.main())
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 436, in main
    return _RunNamedCommandAndHandleExceptions(
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 785, in _RunNamedCommandAndHandleExceptions
    _HandleUnknownFailure(e)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 633, in _RunNamedCommandAndHandleExceptions
    return command_runner.RunNamedCommand(command_name,
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 421, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/acl.py", line 587, in RunCommand
    self._SetAcl()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/acl.py", line 411, in _SetAcl
    self.SetAclCommandHelper(SetAclFuncWrapper, SetAclExceptionHandler)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1114, in SetAclCommandHelper
    canned_acls = storage_uri.canned_acls()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/storage_uri.py", line 220, in canned_acls
    conn = self.connect()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/storage_uri.py", line 121, in connect
    self.connection = GSConnection(access_key_id,
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/gs/connection.py", line 45, in __init__
    super(GSConnection, self).__init__(gs_access_key_id, gs_secret_access_key,
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/s3/connection.py", line 202, in __init__
    super(S3Connection, self).__init__(host,
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/connection.py", line 572, in __init__
    self._auth_handler = auth.get_auth_handler(
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/vendored/boto/boto/auth.py", line 1033, in get_auth_handler
    raise boto.exception.NoAuthHandlerFound(
boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 4 handlers were checked. ['DevshellAuth', 'HmacAuthV1Handler', 'OAuth2Auth', 'OAuth2ServiceAccountAuth'] Check your credentials

the error message is confusing since it says its an authentication issue but I know authentication is not the issue since the rsync command right before this is working fine. I also modified the workflow to do gCloud auth list before ACL set and that command also shows that gCloud is authenticated.

I know its also not an authorization issue since the service account which is being used by WIF has "Storage Admin" and "Storage Object Admin" roles which adds storage.buckets.* and storage.objects.* permissions to the account so it has already more than the required permissions needed to set public ACL on the bucket or its objects, you can verify what permission are required for gsutil acl set vs what are already there using following documentation links.

https://cloud.google.com/storage/docs/access-control/iam-gsutil

https://cloud.google.com/storage/docs/access-control/iam-roles

Surprisingly if I remove WIF authentication and If I directly use a service account key for authentication the error goes away and gsutil acl set works fine, which tells me there might be an issue with WIF configuration but nothing seems out of the ordinary to me. I followed this blog post by circleCI to setup OIDC authentication / WIF for GCP. The issue seems to be specific to the gsutil acl set command, as other gsutil commands (like rsync) are working fine with WIF authentication. I don't want to use service account keys for authentication since google recommends against using them as they can pose security risk if compromised.

What I've tried so far,

  1. I tried to use private ACL instead of public-read just to make sure its not an issue specific to specific ACL
  2. I've verified the gsutil version, and circleCI is using the latest version i.e. 5.24
  3. I tried removing the -m flag which is used for multi threading and is known to cause issue sometimes
  4. I checked GCP logs explorer if there are any additional logs available but there are none
  5. I Enabled debug logging for gsutil / boto3 to see if it logs some additional info by doing following before the acl set command, but it didn't log any additional information,
export BOTO_CONFIG=/home/circleci/.boto/boto.cfg
echo "[Boto]" >> $BOTO_CONFIG
echo "debug = 2" >> $BOTO_CONFIG
gsutil acl set -R -a public-read gs://bucket
  1. I've checked if there are diff scopes available when using service account key vs WIF, by using following curl but both showed same scopes,
access_token=$(gcloud auth print-access-token) && curl "https://oauth2.googleapis.com/tokeninfo?access_token=${access_token}"

any help in this regard is welcome.


Solution

  • So it turns out the issue was happening when boto was trying to make a connection to GCP in order to resolve canned ACL i.e. public-read. Not sure why this happens since this is for the gsutils or boto developers to check but as as a workaround I switched to the ch sub-command, as that allows for setting ACLs without using canned ACLs. following is the alternate gsutil command which works fine with WIF.

    gsutil -m acl ch -u AllUsers:READ gs://bucket/**