sql-server t-sql query-optimization sqlclr query-planner

Resolving Poor Execution Plan in SQL Server with CLR Scalar Function

Essentially, my question boils down to observing these two query plans (from SQL Server 2019):

Plan using a SQL-defined scalar function.
- Link to plan

Plan using a CLR-defined scalar function.
- Link to plan

I defined a scalar function to parse out a string representation of an IP address into the binary(16) value of the IPv6 address (mapping IPv4 addresses over to IPv6). I first implemented it in SQL, but then I also implemented it in C# using the built-in IPAddress class to parse the value. I'm attempting to use this function to join a table containing IP address strings to a table with a list of parsed out CIDR blocks (clustered index on [Start] and [End] binary(16) values).

There were two ways that I wrote the SQL query:

Use the scalar function directly in the JOIN criteria.

SELECT
    *
FROM
    [dbo].[values]
    val
LEFT JOIN
    [dbo].[cidr]
    cidr
        ON [dbo].[fn_ParseIP](val.[IpAddress]) BETWEEN cidr.[Start] AND cidr.[End]
;

Compute the scalar function in an APPLY clause before referencing it in the JOIN.

SELECT
    val.*, cidr.*
FROM
    [dbo].[values]
    val
CROSS APPLY
    (
        SELECT [ParsedIpAddress] = [dbo].[fn_ParseIP](val.[IpAddress])
    )
    calc
LEFT JOIN
    [dbo].[cidr]
    cidr
        ON calc.[ParsedIpAddress] BETWEEN cidr.[Start] AND cidr.[End]
;

In my testing, [dbo].[values] contained 17 rows (and was defined using VALUES rather than being an actual table) and [dbo].[cidr] contained 986,320 rows.

When using a scalar function defined in SQL, query 1 takes about 7.5 minutes to run and query 2 takes under 1 second.

When using a CLR scalar function, both queries take about 2.5 minutes to run, but query 2 has an extra node in the query plan to compute the scalar function after the join.

The difference is ultimately that when referencing the scalar function that was defined in SQL, I'm able to get it to generate the first plan where it computes the results of the scalar function first and then uses those as the seek predicate into the [dbo].[cidr] clustered index when performing the join. But when using the CLR function, it always performs the calculation as part of the clustered index seek (and filter), so it is running the function significantly more often to get the results.

My assumption is that this behavior could be due to the query planner believing the function to be nondeterministic, but I have the CLR function implemented with the following attributes:

    [SqlFunction(
        DataAccess = DataAccessKind.None,
        IsDeterministic = true,
        IsPrecise = true
    )]
    [return: SqlFacet(IsFixedLength = true, IsNullable = true, MaxSize = 16)]
    public static SqlBinary fn_CLR_ParseIP([SqlFacet(MaxSize = 50)] SqlString ipAddress)
    { }

My hope was that I could rely on the .NET standard library to deal with the IP address parsing for me in SQL. Currently, we have some processes that work with IPv4 addresses only, and I need to update them to work with IPv6 as well. In some of our large databases, this processing is very slow, so I was hoping that the parsing logic in .NET would be more efficient. It seems like the CLR function itself is faster than my SQL implementation, but the effect on the query plan is significantly worse.

I can likely rewrite the query to use temporary tables to parse the IP addresses out first, and that should resolve this issue. I can also get decent results when defining an equivalent CLR table-valued function that returns just a single row.

However, I'd like to know if there is something that I'm missing that would make it easier to use a CLR scalar function as part of a filter predicate. Is it just a bad idea and I should proceed with some of these alternatives, or is there something I that I could do that would make it easier to work with the CLR function as a drop-in replacement for the SQL function?

For anybody that is interested, here is the final query that is performing well using the concept given in T N's answer.

SELECT
    *
FROM
    [dbo].[values]
    val
CROSS APPLY
    (
        SELECT [IpAddress] = [dbo].[fn_CLR_ParseIP](val.[IpAddress])
    )
    parsed
OUTER APPLY
    (
        SELECT TOP (1)
            *
        FROM
            [dbo].[cidr]
            _cidr
        WHERE
                _cidr.[range_start] <= parsed.[IpAddress]
            AND _cidr.[range_end]   >= parsed.[IpAddress]
        ORDER BY
            _cidr.[range_start] DESC
    )
    cidr
;

Solution

If your cidr table has rows with no overlapping Start/End ranges, so that for any given IP address there will be at most one matching row, you can:

Use SELECT TOP 1 ... ORDER BY ... logic to efficiently identify a single candidate row based on the Start column.
Confirm a match by checking the End column value.
Wrap this all up in an OUTER APPLY, with the TOP 1 further wrapped up inside a nested subselect.
The WHERE [End] ... test must be performed after the TOP 1 operation to prevent scanning of additional rows for the case where the (first) candidate row does not match.

OUTER APPLY (
    SELECT candidate.*
    FROM (
        SELECT TOP 1
        FROM cidr
        WHERE cidr.Start <= calc.ParsedIpAddress
        ORDER BY cidr.Start DESC
    ) candidate
    WHERE candidate.[End] >= calc.ParsedIpAddress
) match

An index on either cidr(Start), cidr(Start, [End]), or cidr(Start) INCLUDE([End]) would be needed for efficient operation.

If your source cidr table can have arbitrarily overlapping ranges such that a given IP can potentially have multiple matches, then I know of no simple & efficient lookup method that avoids a range scan. An index on cidr(Start, [End]) would at least perform the range scan on a limited-width (more compact) index rather than to the presumable wider table (clustered index). That might provide some performance improvement.

A more involved solution might involve segregating/partitioning the cidr table based on the number of fixed octets (like the original/obsolete class A, B, C, D subnets) or the number of fixed bits (/16, /24, etc) - up to 16 octets or 128 bits for IPV6. You would then optimize the lookups into each partition and UNION the results. This would almost simulate the way SQL server implements spacial indexes.