postgresqlbulkinsertdebezium

how postgres bulk insert affect Debezium?


We have an Postgres instance with multiple db (A, B, C, D, etc). We have debezium cdc setup on db A ONLY.But we also need to bulk insert millions of rows to other db like B.It seems that debezium cdc still need to scan the WAL of B and filter out events,so bulk insert may slow or block debezium,am i right?

How to avoid these kinds of blocking or slowness? should we stop debezium connect temporarily when bulk insert? or is there any way to skip scanning database B's WAL?

Thanks


Solution

  • We cannot skip the scanning or reading database B's WAL. Walsender must read all the transactions,so that he will know changes in shared catalogs between databases,this is required to maintain proper snapshot.In postgres logical decoding there are three main things :-

    1.Reading the wal.

    2.Queuing the changes to reorderbuffer.

    3.Decoding the changes from reorderbuffer when transaction is comitted.

    Database A's walsender will only read the changes of Database B,but he will not put B's changes into reorderbuffer.

    Below is the check/filter in source code in case of inserts https://github.com/postgres/postgres/blob/65db0cfb4c036b14520a22dba5a858185b713643/src/backend/replication/logical/decode.c#L913

    When logical decoding is going on and if you parallely insert million of rows to postgres,the decoding rate will be decreased,therefore you see the slowness.