postgresqlgoogle-bigquerygoogle-cloud-sqlssh-tunnelgoogle-datastream

How can I use cloud SQL (postgres) server as SSH host to connect from Datastream?


I am trying to connect to my Cloud SQL PostgreSQL database from Datastream (to get data to BigQuery). I am using public IP, and since it's set up to require trusted client certificates, my understanding is that I need to use Forward-SSH tunnel as connectivity method from Datastream (please correct me if I'm wrong on this).

According to this guide, it should be possible to use the database server to terminate the SSH tunnel. It does not specify in detail how to do it, only to "allow network traffic to reach the tunnel server or database host via SSH, which is generally on TCP port 22".

I am not able to connect, and getting the error "We can't connect to the data source. Verify that the Forward SSH tunnel configuration is configured correctly in the connection profile and that the port is open on the SSH tunnel server."

I created a connectivity test on the database confirming that port 22 is not reachable.

How can I get this to work? Is there any setting up that needs to be done on the database side? How do I allow traffic on port 22? Or should I use a different port?

Appreciate any help on this!


Solution

  • Cloud SQL Postgres recently (over the past couple of weeks) has updated its "Require SSL" setting to be more configurable. In the past specifying "Allow only SSL connections" forced successful connections to use mTLS and client certificates. Now the settings are more flexible and granular.

    enter image description here

    I would recommend for most cases users select Allow only SSL connections which will force successful connections to encrypt traffic and use SSL (but not force mTLS + client certs). This setting will allow for users to use the IP Allowlist Connectivity Option for Datastream and be able to follow along with the Datastream Getting Started Quickstart while still forcing an encrypted connection.

    For users that do want the additional overhead of managing client certs and requiring mTLS they can add this by selecting Require trusted client certificates in the manage SSL setting of their Cloud SQL instance.

    In this case than yes like you mentioned you would be required to use the SSH Tunnel Datastream Connectivity Option.

    My understanding is that since your source database is a Cloud SQL Postgres instance with a Public IP than you have to configure a tunnel server (GCE VM) and are not able to use the database server directly. The reason for this being that Cloud SQL is a managed service and the database server is on a protected "Cloud SQL" network. You will want to instead create a compute engine VM to act as your tunnel/ssh server where you can control the TCP firewall and port-forwarding rules at a more granular level. The use of the database server as the host to terminate the tunnel would be more so for the use-case where your source database was a self-hosted Postgres database.