Trying to use a specific (not-default) service account for running kfp pipelines in VertexAI. JSON keys are not an option.
Ideally gets both project ID and credentials using google.auth.default()
, as suggested in google.auth
user guide.
So far, I've tried:
kfp.v2.google.client.AIPlatformClient
, client instantiated with project ID specified and running the pipeline with create_run_from_spec
with service_account
keyword argumentgoogle.cloud.aiplatform.pipeline_jobs.PipelineJob
, object instantiated with project ID, pipeline run with submit
and service_account
kwargI've tried all three with both the actual pipeline (running on custom built containers) and a minimal working example (using lightweight python components). In all cases, when I run creds, project = google.auth.default()
and then printing the project and creds.service_account_email
, I get a project ID I don't recognize (always the same one in all cases) and default
for the service account email.
I think I must be doing something wrong, but I can't figure out what. It seems like the configuration I'm passing to the pipeline run isn't being used at all.
For reference, the MWE:
from kfp.v2 import dsl
@dsl.component(packages_to_install=['google-auth'])
def check_auth(name:str) -> str:
import google.auth
creds,project = google.auth.default()
print(f'Project is: {project}')
print(f'Got creds for: {creds.service_account_email}')
return project
@dsl.pipeline(
name='adc-mwe-pipeline'
)
def pipeline() -> str:
auth_check = check_auth(name='name')
return auth_check.output
from google.cloud.aiplatform import pipeline_jobs
from kfp.v2 import compiler
compiler.Compiler().compile(pipeline_func=pipeline, package_path='mwe.json')
start_pipeline = pipeline_jobs.PipelineJob(
display_name='mwe',
template_path='mwe.json',
location='some-location',
project='my-project',
enable_caching=False
)
start_pipeline.submit(service_account="my-service-account")
Figured out the correct way to use application default credentials is to not invoke credentials explicitly at all.
So, for example, with BigQuery:
from google.cloud import bigquery
client = bigquery.Client(project='my_project')
query_job = client.query(some_sql_query)
Running this in the Compute Engine instance or a component in a pipeline will use the credentials of the service account attached to the Compute Engine instance or the service account used to submit the pipeline (as in the question).
Hope this helps someone else. Quite frustrating that it isn't documented clearly.