databasecdcdata-collectionstreamsets

Can Streamsets Data Collector CDC read from and write to multiple tables?


I have a MSSQL database whose structure is replicated over a Postgres database. I've enabled CDC in MSSQL and I've used the SQL Server CDC Client in StreamSets Data Collector to listen for changes in that db's tables.

But I can't find a way to write to the same tables in Postgres.

For example I have 3 tables in MSSQL: tableA, tableB, tableC. Same tables I have in Postgres. I insert data into tableA and tableC. I want those changes to be replicated over Postgres.

In StreamSets DC, in order to write to Postgres, I'm using JDBC Producer and in the Table Name field I've specified: ${record:attributes('jdbc.tables')}.

Doing this, the data will be read from tableA_CT, tableB_CT, tableC_CT. Tables created by MSSQL when you enable the CDC option. So I'll end up with those table names in the ${record:attribute('jdbc.tables')}.

Is there a way to write to Postgres in the same tables as in MSSQL ?


Solution

  • You can cut the _CT suffix off the jdbc.tables attribute by using an Expression Evaluator with a Header Attribute Expression of:

    ${str:isNullOrEmpty(record:attribute('jdbc.tables')) ? '' : 
      str:substring(record:attribute('jdbc.tables'), 0, 
        str:length(record:attribute('jdbc.tables')) - 3)}
    

    Note - the str:isNullOrEmpty test is a workaround for SDC-9269.