sqlsql-servervarbinarysqlfilestreamdatalength

EndsWith SQL query for VARBINARY(MAX) FILESTREAM columns


Currently I have two SQL query proposals which call DATALENGTH on the same VARBINARY column more than once, due to conditions in the WHERE clause.

Do these multiple calls to DATALENGTH in the same query interfere in the performance, or can the result of the DATALENGTH calculation be optimized/cached by Microsoft SQL Server?

The goal is to build an EndsWith SQL query for VARBINARY(MAX) FILESTREAM columns.

In my case, I cannot add any extra indexes or columns to the table, however I'd like to understand what are the possible alternatives.

-- Table with BinData
DECLARE @foo TABLE(BinData VARBINARY(MAX));
INSERT INTO @foo (BinData) VALUES
(0x00001125), (0x00112500), (0x11250000),
(0x00000011), (0x00001100), (0x00110000), (0x11000000),
(0x112500001125);

-- BinKey
DECLARE @BinKey VARBINARY(MAX) = 0x1125;

-- Search for [BinData] ending with @BinKey --

What I have tried so far...

Proposal A - REVERSE BinKey and BinData, and then use CHARINDEX.

SELECT * FROM @foo WHERE
    DATALENGTH(BinData) >= DATALENGTH(@BinKey) AND
    CHARINDEX
    (
        CONVERT(VARCHAR(MAX), @BinKey),
        CONVERT(VARCHAR(MAX), BinData)
    ) = 1 + DATALENGTH(BinData) - DATALENGTH(@BinKey);

Proposal B - Cut the ending part of the BinData and compare it to BinKey.

SELECT * FROM @foo WHERE
    DATALENGTH(BinData) >= DATALENGTH(@BinKey) AND
    SUBSTRING(
        BinData,
        1 + DATALENGTH(BinData) - DATALENGTH(@BinKey),
        DATALENGTH(@BinKey)
    ) = @BinKey;

Proposal from James L. - Simply use the RIGHT function.

SELECT * FROM @foo WHERE
    RIGHT(BinData, DATALENGTH(@BinKey)) = @BinKey;

The result for the queries listed above must be:

0x00001125
0x112500001125

Is there any better algorithm for this? How well would it perform for larger blobs?


Solution

  • I'm not sure how to analyze how this will perform, but this does what you asked for:

    -- Table with BinData
    DECLARE @foo TABLE(BinData VARBINARY(MAX));
    INSERT INTO @foo (BinData) VALUES
    (0x00001125), (0x00112500), (0x11250000),
    (0x00000011), (0x00001100), (0x00110000), (0x11000000),
    (0x112500001125);
    
    -- BinKey
    DECLARE @BinKey VARBINARY(MAX) = 0x1125;
    
    -- Search for [BinData] ending with @BinKey --
    select *
    from   @foo
    where  right(BinData, datalength(@BinKey)) = @BinKey
    

    Which returns the following:

    BinData
    ----------------
    0x00001125
    0x112500001125
    

    I added 10,000 records, each about 1kb in size. It took .020 seconds to query all of them. I also tried adding 10 records 1MB in size after the 10,000 1kb records. It took .037 seconds to run the query. When the 1kb or 1MB records end with 0x1125, then it increases the execution time by a little bit, because SSMS has to receive and format the record in the output window. 50 1kb rows that end in 0x1125 took .057 seconds to run. When one of the 1MB rows ended in 0x1125, it took .110 seconds to run. It's not very scientific, but it seems to perform adequately.