I have a bit on an academical question that have arisen during my experience as an intern for my uni thesis.
So far I've been working on a Kafka Cluster, for testing purposes, to connect two MySQL databases between each other; something that I've managed to do by leveraging the Debezium Mysql Connector (Source) and the JDBC Connector (Sink).
Now, my boss is exploring the option of utilizing Debezium in production on the long term but he's not convinced that it can get data from the db without performing any query or other operations that can put strain on the db.
I've taken a look to the Debezium's source code and, of course, I didn't find any code that performs SELECTs or other queries in the module that actually reads the binlog but at the same time I haven't been able to glean how actually Debezium manages to actually do that.
Does the Debezium's JDBC implementation has the ability to read the logfile in some way? Or does it simply manage to locate the logfile in some way to stream it to the connector?
My boss's doubt are not entirely groundless as the Connector needs to connect with a user that has SELECT permissions but still, if it wasn't a more elegant solution than periodic polling it wouldn't be as popular as it is.
The connector leverages the MySQL binlog client in order to connect and act as a replica on your MySQL topology in order to read and process binlog events as they're written to your primary's transaction logs.
So the same overhead you would experience by adding a MySQL replica to your cluster is the same as to what you'd see by adding Debezium to stream changes from your MySQL database.