google-cloud-platformsqoopgoogle-cloud-dataprocgoogle-cloud-vpn

Can GCP Dataproc sqoop data (or run other jobs on) from local DB?


Can GCP Dataproc sqoop import data from local DB to put into GCP Storage (without GCP VPC)?

We have a remote Oracle DB connected to our local network via VPN tunnel that we use a Hadoop cluster to extract data out of each day via Apache Sqoop. Would like to replace this process with GCP Dataproc cluster to run the sqoop jobs and GCP Storage. Found this article that appears to be doing something similar Moving Data with Apache Sqoop in Google Cloud Dataproc, but it assumes that users have GCP VPC (which I did not intend on purchasing).

So my question is:


Solution

  • Without using VPC/VPN you will not be able to grant Dataproc access to your local DB.

    Instead of using VPC, you can use VPN if it meets your needs better: https://cloud.google.com/vpn/docs/

    Only other option that you have is to open up your local DB to Internet so Dataproc will be able to access it without VPC/VPN, but this is inherently insecure.