google-cloud-platformgoogle-datastream

Datastream configuration not working for Cloud SQL PostgreSQL as source and BigQuery as destination


I wanted to quickly give a try to datastream from cloud SQL PotgreSQL to BigQuery. I created a test Cloud SQL instance where I use postgres who is already as cloudsqlsuperuser. But the wizard provided by Datastream is not helping: part of the provided statements are not working.

Here are the steps I followed.

enter image description here

  1. I activated cloudsql.logical_decoding flag as required by the Enable logical replication on the database Step

enter image description here

  1. For the Create a publication and a replication slot step, I succesfully ran the first command CREATE PUBLICATION DATASTREAM_PUBLICATION FOR ALL TABLES; But the second SELECT PG_CREATE_LOGICAL_REPLICATION_SLOT('DATASTREAM_SLOT_NAME', 'PGOUTPUT'); failed

  2. First I had this error : ERROR: must be superuser or replication role to use replication slots SQL state: 42501 I fixed with this command ALTER USER postgres WITH REPLICATION;

  3. When I re ran SELECT PG_CREATE_LOGICAL_REPLICATION_SLOT('DATASTREAM_SLOT', 'PGOUTPUT'); Then I had this error : ERROR: replication slot name "DATASTREAM_SLOT_NAME" contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character. SQL state: 42602 I fixed it using lower case for the slot name.

  4. When I ran command SELECT PG_CREATE_LOGICAL_REPLICATION_SLOT('datastream_slot', 'PGOUTPUT'); I got this error: ERROR: could not access file "PGOUTPUT": No such file or directory SQL state: 58P01 I fixed it by putting PGOUTPUT in lower case and re ran : SELECT PG_CREATE_LOGICAL_REPLICATION_SLOT('datastream_slot', 'pgoutput');

enter image description here

  1. For the Create Datastream User step I could run : CREATE USER datastream_user WITH ENCRYPTED PASSWORD '*****';

but Fail to run: GRANT RDS_REPLICATION TO datastream_user; with this error role "rds_replication" does not exist SQL state: 42704

I could run GRANT SELECT ON ALL TABLES IN SCHEMA public TO datastream_user;

I could run GRANT USAGE ON SCHEMA public TO datastream_user;

I failed to run ALTER DEFAULT PRIVILEGES IN SCHEMA public with this error : ERROR: syntax error at end of input LINE 1: ALTER DEFAULT PRIVILEGES IN SCHEMA public

I failed to run GRANT SELECT ON TABLES TO datastream_user; with this error: ERROR: relation "tables" does not exist SQL state: 42P01

I dont have solutions for the failed statement, but also these are the ones provided on the console for Datastream. Any help will be much appreciated.


Solution

  • There's apparently a mixup in the documentation steps in the UI. You can follow the online docs for the correct steps.

    Specifically, in the CREATE PUBLICATION AND REPLICATION SLOT section you already figured it out, the correct command is:

    SELECT PG_CREATE_LOGICAL_REPLICATION_SLOT('[REPLICATION_SLOT_NAME]', 'pgoutput'); 
    

    And in the CREATE DATASTREAM USER section:

    1. You don't need to run that first command (that's copy-pasted from RDS apparently, not relevant here)

    2. Those last two "steps" should actually be a single command:

      ALTER DEFAULT PRIVILEGES IN SCHEMA [MY_SCHEMA] GRANT SELECT ON TABLES TO [MY_USER];