I'm trying to run a private data fusion pipeline from cloud functions. My assumption was that I would need to create the following Serverless VPC Connectors: https://cloud.google.com/vpc/docs/configure-serverless-vpc-access?_ga=2.30674431.-1361434534.1676966158#before_you_begin
However, when I made a request to the following API without creating a serverless VPC connector, The pipeline ran successfully.
POST -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/namespaces/namespace-id/apps/pipeline-name/workflows/DataPipelineWorkflow/start"
Reference site: https://cloud.google.com/data-fusion/docs/reference/cdap-reference#start_a_batch_pipeline
Why is Serverless VPC Connector unnecessary when accessing Private Data Fusion via API from Cloud Functions?
If you lookup the IP address of the CDAP API host, you will see that it resolves to a public IP address:
nslookup <instance-id>-<project-id>-dot-<region-code>.datafusion.googleusercontent.com 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8#53
Non-authoritative answer:
<instance-id>-<project-id>-dot-<region-code>.datafusion.googleusercontent.com
canonical name = googlehosted.l.googleusercontent.com.
Name: googlehosted.l.googleusercontent.com
Address: 64.233.170.132
Name: googlehosted.l.googleusercontent.com
Address: 2404:6800:4003:c1a::84
All the requests sent to the CDAP API go to a publicly accessible API endpoint by default. You can setup private Google Access so that requests to *.datafusion.googleusercontent.com
are routed to a private IP address when the source VM is inside a GCP VPC.
VPC peering is required for VMs in the Cloud Data Fusion (CDF) tenant project to reach out to IP addresses in the customer project VPC for running previews and validating source/sink connections during pipeline deployments.