sql-serverazure-data-factorycdc

Why am I getting a null row when querying a CDC function in an ADF Lookup activity?


I am trying to build an ADF pipeline that incrementally updates a table using Change Data Capture (CDC). And I'm running into an issue when I try to use a CDC built in table function in an ADF Lookup query activity and ADF is receiving a single row filled with NULL values. When I run the exact same function directly in SQL server, I get a valid result.

To debug this issue, I'm following the Microsoft guide here: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-change-data-capture-feature-portal

Here's my initial issue. I run the following script in an ADF Lookup activity:

DECLARE @from_lsn binary(10), @to_lsn binary(10);  
SET @from_lsn =sys.fn_cdc_get_min_lsn('my_custom_table');  
SET @to_lsn = sys.fn_cdc_max_lsn(); 
SELECT * FROM cdc.fn_cdc_get_net_changes_my_custom_table(@from_lsn, @to_lsn, 'all')

This returns a single row with all of the appropriate columns and all values set to NULL.

{
    "count":1,
    "value":[
        {
            "__$start_lsn": null,
            "__$operation": null,
            "__$update_mask": null,
            "Company_Code": null,
            "Equipment_Code": null,
            "id": null
        }
    ]
...
}

If I run this same script via SSMS, I receive three rows with the data I expect.

__$start_lsn __$operation __$update_mask Company_Code Equipment_Code id
0x000CBDE900022E98002C 4 NULL 011 50 15
0x000CBDE900054118001C 4 NULL 011 51 16
0x000CBDE9000541A0000C 4 NULL 011 52 17

Debugging, I tried the following. I modified the script to just get a count of rows:

DECLARE @from_lsn binary(10), @to_lsn binary(10);  
SET @from_lsn =sys.fn_cdc_get_min_lsn('my_custom_table');  
SET @to_lsn = sys.fn_cdc_max_lsn(); 
SELECT COUNT(*) FROM cdc.fn_cdc_get_net_changes_my_custom_table(@from_lsn, @to_lsn, 'all')

This returns the correct result: 3.

{
    "count":1,
    "value": [
        {
            "":3
        }
    ],
...
}

I thought maybe ADF was having problems with one of my return types, so I scoped it so a single column (an int):

DECLARE @from_lsn binary(10), @to_lsn binary(10);  
SET @from_lsn =sys.fn_cdc_get_min_lsn('my_custom_table');  
SET @to_lsn = sys.fn_cdc_max_lsn(); 
SELECT id FROM cdc.fn_cdc_get_net_changes_my_custom_table(@from_lsn, @to_lsn, 'all')

This returns a single row with only a NULL id.

{
    "count":1,
    "value":[
        {
            "id": null
        }
    ],
...
}

I've tried wrapping the code in a stored procedure, with the same NULL results. If I query the table directly (cdc.my_custom_table_CT), I get the results I expect. When I call other simple CDC functions (like sys.fn_cdc_get_min_lsn and sys.fn_cdc_max_lsn above), I get the results I expect.

Any ideas what I'm doing wrong, or how I can debug the issue?


A commentor asked for the pipeline. It's pretty basic:

{
    "name": "Test",
    "properties": {
        "activities": [
            {
                "name": "Test",
                "type": "Lookup",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "SqlServerSource",
                        "sqlReaderQuery": "DECLARE @from_lsn binary(10), @to_lsn binary(10); \nSET @from_lsn =sys.fn_cdc_get_min_lsn('my_custom_table'); \nSET @to_lsn = sys.fn_cdc_get_max_lsn();\nSELECT * FROM cdc.fn_cdc_get_net_changes_my_custom_table(@from_lsn, @to_lsn, 'all')",
                        "queryTimeout": "02:00:00",
                        "partitionOption": "None"
                    },
                    "dataset": {
                        "referenceName": "TestDataset",
                        "type": "DatasetReference"
                    },
                    "firstRowOnly": false
                }
            }
        ],
        "annotations": []
    }
}

And the ADF dataset definition with mapping:

{
    "name": "TestDataset",
    "properties": {
        "linkedServiceName": {
            "referenceName": "TestLinkedService",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "SqlServerTable",
        "schema": [
            {
                "name": "__$start_lsn",
                "type": "binary"
            },
            {
                "name": "__$operation",
                "type": "int",
                "precision": 10
            },
            {
                "name": "__$update_mask",
                "type": "varbinary"
            },
            {
                "name": "Company_Code",
                "type": "varchar"
            },
            {
                "name": "Equipment_Code",
                "type": "varchar"
            },
            {
                "name": "id",
                "type": "int",
                "precision": 10
            }
        ],
        "typeProperties": {
            "schema": "cdc",
            "table": "my_custom_table_CT"
        }
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

Update (2025/08/29) In order to ensure that I'm pointing at the right database, I created a stored procedure in the database in question, and pointed my ADF Lookup Activity to that instead.

When the stored procedure code looks like this:

CREATE OR ALTER PROCEDURE [dbo].[get_cdc_data]
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;

DECLARE @from_lsn binary(10), @to_lsn binary(10);  
SET @from_lsn =sys.fn_cdc_get_min_lsn('my_custom_table');  
SET @to_lsn = sys.fn_cdc_max_lsn(); 
SELECT id FROM cdc.fn_cdc_get_net_changes_my_custom_table(@from_lsn, @to_lsn, 'all')

END

I get the same NULL value results. When I change the stored procedure to call the CDC table directly, like this:

CREATE OR ALTER PROCEDURE [dbo].[get_cdc_data]
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;

DECLARE @from_lsn binary(10), @to_lsn binary(10);  
SET @from_lsn =sys.fn_cdc_get_min_lsn('my_custom_table');  
SET @to_lsn = sys.fn_cdc_max_lsn(); 
SELECT id FROM cdc.my_custom_table_CT;

END

I get some valid results. (These aren't aggregated, so I'm getting the raw CDC data.)

{
    "count":6,
    "value":[
        {
            "id": 15
        },
        {
            "id": 15
        },
        {
            "id": 16
        },
        {
            "id": 16
        },
        {
            "id": 17
        },
        {
            "id": 17
        }
    ],
...
}

There seems to be an issue where I can't call the CDC table functions directly from ADF (or even indirectly via a stored procedure).


Solution

  • After lots of random trial and error, I found a solution to this problem.

    TLDR: The SQL user we were using in the ADF linked service didn't have the cdc_admin user role. When we enabled it, the queries began working as expected.

    More details: My best guess why this happened -- the auto-generated cdc.fn_cdc_get_net_changes_my_custom_table function code contains calls to other auto-generated CDC functions and stored procedures, including ones in master database schema. The SQL user had permissions to call the main function, but not the sub-functions in the master schema. And cdc.fn_cdc_get_net_changes_my_custom_table gave a bad response instead of failing.

    What I still don't understand: I don't get why the behavior is different when the query is sent from ADF and SSMS. My SQL user can call the table function fine in SSMS. This issue only happens from ADF.

    Update 2025/09/05: I received a response from Microsoft on this behavior. Here's what they said: This behavior is by design. When calling the table function from ADF, the execution context differs from SSMS because ADF uses an external connection that enforces stricter permission checks. The cdc_admin role is required in this context to ensure access to CDC-related metadata. SSMS, on the other hand, runs under your authenticated SQL context, which already has the necessary privileges.