I have 2 large tables with CDC. The idea is to transfer data to some new tables. So, I'd like to read data from the CDC tables by parts and do not read repeatedly what already was read.
How it can be done?
The fn_cdc_get_net_changes_*
function, (or get all changes if that's your requirement) which is created by SQL Server for a table when you enable it for CDC, takes a from_LSN
and to_LSN
parameter. This is your sliding window. The function returns a __$start_lsn
column. Use that to keep track of where you're up to, and slide your window forward the next time you call it.
Per your comment - yes, after getting the results from the change function, you should store the largest __$start_lsn
value you received. This would then become your from_lsn
parameter on the next call...
... Well, almost. There are a few "gotchas" when it comes to calling these change functions. It's not terribly complex, but it's too much information to provide in a StackOverflow answer. Fortunately there is Microsoft documentation on the subject. See the section on Validating LSN boundaries.
Additionally, note that the from and to parameters to the change functions represent a closed interval. In other words, the function will return data greater than or equal to the from_lsn
value, and less than or equal to the to_lsn
parameter.
So, suppose your algorithm looks like this:
@variable
.@variable
(as per the validating boundaries documentation)@variable
as the from_lsn
parameter value.__$start_lsn
from the result set and save that back to persistent storage.If you do this you will get some of the same rows you got last time - ie, any rows whose LSN is equal to the value in @variable
.
What you should therefore do instead is this:
@variable
.@variable
.@variable
(as per the validating boundaries documentation)@variable
as the from_lsn
parameter value.__$start_lsn
from the result set and save that back to persistent storage.To get the value you should use for the to_LSN
parameter, you can use fn_cdc_get_max_lsn