sqlsql-servert-sqlsql-server-2016bitcount

How can I efficiently convert a bigint into a base 2 string and count the 1s?


I have a bigint column or variable. I want to convert it into its bitmap (base 2) representation as a varchar and additionally to that count the 1s.

I need both components - not just the bit_count equivalent

Example:

DECLARE @a bigint=-9151314442816847872
,@b bigint=71776119061217280

-9151314442816847872 converts to 1000000100000000000000000000000000000000000000000000000000000000 and there are 2 ones in the bitmap.

71776119061217280 converts to 0000000011111111000000000000000000000000000000000000000000000000 and there are 8 ones in it

The questions are:

  1. How to quickly convert from bigint to bitmap? and
  2. How to quickly counts 1s in the bitmap?

There are millions of numbers to process in a very short time. Every millisecond counts!

Solution needs to be compatible with SQL Server 2016.

Here is my code.

My algorithm converts 1mln numbers in 18 seconds. This is slow if millions of numbers have to be converted. BIT_COUNT() is magnitude of orders faster, but it is available since SQL 2022. I need a solution compatible with 2016.

Can it be done faster?

DECLARE @a bigint=71776119061217280;
DECLARE @s char(64);

SET @s= 
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
CONVERT(varchar,CAST(@a as binary(8)),2)
,'0','0000'),'1','0001'),'2','0010'),'3','0011'),'4','0100'),'5','0101'),'6','0110'),'7','0111')
,'8','1000'),'9','1001'),'A','1010'),'B','1011'),'C','1100'),'D','1101'),'E','1110'),'F','1111');

PRINT 'Bitmap: '+@s;
PRINT '1s count: '+CAST(64-LEN(REPLACE(@s,'1','')) as varchar);

Solution

  • For evaluating the bitmap string I do see considerable improvement with using the CONCAT function to combine the 64 component characters vs the method given in the question.

    In the below test in SQL Server 2022 the difference was 1.6 seconds vs 44 seconds on my machine (function definition at foot of answer)

    SET STATISTICS TIME ON;
    
    DECLARE @Base2String CHAR(64)
    
    SELECT @Base2String = Base2String
    FROM generate_series(1,1000000)
    CROSS APPLY dbo.BigIntToBase2String (value)
    
    SELECT @Base2String = Base2StringOrig
    FROM generate_series(1,1000000)
    CROSS APPLY dbo.BigIntToBase2String (value)
    

    evaluating the bitmap count

    Originally my claim here was that it was quicker to just re-evaluate this from scratch but I have changed my mind on this.

    I have found that calculating the bitmap count from the string can be quicker as long as

    The below tries various permutations with indicative execution times on my machine given in the comments

    -- CPU time = 2579 ms,  elapsed time = 2692 ms.
    SELECT @BitCount = BitCount, 
           @Base2String = Base2String
    FROM generate_series(1,1000000)
    CROSS APPLY dbo.BigIntToBase2String (value)
    
    
    PRINT 'Method 1'
    --   CPU time = 3828 ms,  elapsed time = 4066 ms.
    --  Execution plan shows that Base2String is being calculated twice
    SELECT @BitCount = 64-LEN(REPLACE(Base2String COLLATE Latin1_General_BIN2,'1','')), 
           @Base2String = Base2String
    FROM 
    (
    SELECT Base2String
    FROM generate_series(1,1000000)
    CROSS APPLY dbo.BigIntToBase2String (value)
    ) ca
    
    PRINT 'Method 2'
    --   CPU time = 3969 ms,  elapsed time = 4125 ms.
    --   only computes once but adds overhead of nested loops operator 
    SELECT @BitCount = 64-LEN(REPLACE(Base2String COLLATE Latin1_General_BIN2,'1','')), 
           @Base2String = Base2String
    FROM 
    (
    SELECT Base2String = Base2String
    FROM generate_series(1,1000000)
    OUTER APPLY dbo.BigIntToBase2String (value)
    ) ca
    OPTION (MAXDOP 1)
    

    Based on the above evaluating separately is the winner but I have since tried...

    
    PRINT 'Method 3'
    --  CPU time = 1609 ms,  elapsed time = 1717 ms.
    SELECT @BitCount = 64-LEN(REPLACE(Base2String COLLATE Latin1_General_BIN2,'1','')), 
           @Base2String = Base2String
    FROM 
    (
    SELECT Base2String = CONCAT(Base2String, CRYPT_GEN_RANDOM(0))
    FROM generate_series(1,1000000)
    CROSS APPLY dbo.BigIntToBase2String (value)
    ) ca
    
    PRINT 'Method 3.1'
    --  CPU time = 5782 ms,  elapsed time = 6066 ms.
    SELECT @BitCount = 64-LEN(REPLACE(Base2String,'1','')), 
           @Base2String = Base2String
    FROM 
    (
    SELECT Base2String = CONCAT(Base2String, CRYPT_GEN_RANDOM(0))
    FROM generate_series(1,1000000)
    CROSS APPLY dbo.BigIntToBase2String (value)
    ) ca
    

    The chicanery with CRYPT_GEN_RANDOM is based on the idea here and it does now only have a single evaluation of Base2String without adding any nested loops to the plan, at least in my tests this did become the new winner though omitting the binary collate clause did incur a significant penalty

    Function definition used in above tests

    Credit to @Habo for optimizing the BitCount part

    CREATE FUNCTION [dbo].[BigIntToBase2String] 
    (   
        @input bigint
    )
    RETURNS TABLE 
    AS
    RETURN 
    (
    SELECT
    /*Original method*/
    Base2StringOrig = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    CONVERT(varchar,CAST(@input as binary(8)),2)
    ,'0','0000'),'1','0001'),'2','0010'),'3','0011'),'4','0100'),'5','0101'),'6','0110'),'7','0111')
    ,'8','1000'),'9','1001'),'A','1010'),'B','1011'),'C','1100'),'D','1101'),'E','1110'),'F','1111'),
    
    /*New method*/
    Base2String = CONCAT(
    IIF(@input & 0x8000000000000000 = 0, '0','1'),
    IIF(@input & 0x4000000000000000 = 0, '0','1'),
    IIF(@input & 0x2000000000000000 = 0, '0','1'),
    IIF(@input & 0x1000000000000000 = 0, '0','1'),
    IIF(@input & 0x0800000000000000 = 0, '0','1'),
    IIF(@input & 0x0400000000000000 = 0, '0','1'),
    IIF(@input & 0x0200000000000000 = 0, '0','1'),
    IIF(@input & 0x0100000000000000 = 0, '0','1'),
    IIF(@input & 0x0080000000000000 = 0, '0','1'),
    IIF(@input & 0x0040000000000000 = 0, '0','1'),
    IIF(@input & 0x0020000000000000 = 0, '0','1'),
    IIF(@input & 0x0010000000000000 = 0, '0','1'),
    IIF(@input & 0x0008000000000000 = 0, '0','1'),
    IIF(@input & 0x0004000000000000 = 0, '0','1'),
    IIF(@input & 0x0002000000000000 = 0, '0','1'),
    IIF(@input & 0x0001000000000000 = 0, '0','1'),
    IIF(@input & 0x0000800000000000 = 0, '0','1'),
    IIF(@input & 0x0000400000000000 = 0, '0','1'),
    IIF(@input & 0x0000200000000000 = 0, '0','1'),
    IIF(@input & 0x0000100000000000 = 0, '0','1'),
    IIF(@input & 0x0000080000000000 = 0, '0','1'),
    IIF(@input & 0x0000040000000000 = 0, '0','1'),
    IIF(@input & 0x0000020000000000 = 0, '0','1'),
    IIF(@input & 0x0000010000000000 = 0, '0','1'),
    IIF(@input & 0x0000008000000000 = 0, '0','1'),
    IIF(@input & 0x0000004000000000 = 0, '0','1'),
    IIF(@input & 0x0000002000000000 = 0, '0','1'),
    IIF(@input & 0x0000001000000000 = 0, '0','1'),
    IIF(@input & 0x0000000800000000 = 0, '0','1'),
    IIF(@input & 0x0000000400000000 = 0, '0','1'),
    IIF(@input & 0x0000000200000000 = 0, '0','1'),
    IIF(@input & 0x0000000100000000 = 0, '0','1'),
    IIF(@input & 0x0000000080000000 = 0, '0','1'),
    IIF(@input & 0x0000000040000000 = 0, '0','1'),
    IIF(@input & 0x0000000020000000 = 0, '0','1'),
    IIF(@input & 0x0000000010000000 = 0, '0','1'),
    IIF(@input & 0x0000000008000000 = 0, '0','1'),
    IIF(@input & 0x0000000004000000 = 0, '0','1'),
    IIF(@input & 0x0000000002000000 = 0, '0','1'),
    IIF(@input & 0x0000000001000000 = 0, '0','1'),
    IIF(@input & 0x0000000000800000 = 0, '0','1'),
    IIF(@input & 0x0000000000400000 = 0, '0','1'),
    IIF(@input & 0x0000000000200000 = 0, '0','1'),
    IIF(@input & 0x0000000000100000 = 0, '0','1'),
    IIF(@input & 0x0000000000080000 = 0, '0','1'),
    IIF(@input & 0x0000000000040000 = 0, '0','1'),
    IIF(@input & 0x0000000000020000 = 0, '0','1'),
    IIF(@input & 0x0000000000010000 = 0, '0','1'),
    IIF(@input & 0x0000000000008000 = 0, '0','1'),
    IIF(@input & 0x0000000000004000 = 0, '0','1'),
    IIF(@input & 0x0000000000002000 = 0, '0','1'),
    IIF(@input & 0x0000000000001000 = 0, '0','1'),
    IIF(@input & 0x0000000000000800 = 0, '0','1'),
    IIF(@input & 0x0000000000000400 = 0, '0','1'),
    IIF(@input & 0x0000000000000200 = 0, '0','1'),
    IIF(@input & 0x0000000000000100 = 0, '0','1'),
    IIF(@input & 0x0000000000000080 = 0, '0','1'),
    IIF(@input & 0x0000000000000040 = 0, '0','1'),
    IIF(@input & 0x0000000000000020 = 0, '0','1'),
    IIF(@input & 0x0000000000000010 = 0, '0','1'),
    IIF(@input & 0x0000000000000008 = 0, '0','1'),
    IIF(@input & 0x0000000000000004 = 0, '0','1'),
    IIF(@input & 0x0000000000000002 = 0, '0','1'),
    IIF(@input & 0x0000000000000001 = 0, '0','1')
    ), 
    BitCount = 
    - SIGN(@input & 0x8000000000000000) 
    + SIGN(@input & 0x4000000000000000) 
    + SIGN(@input & 0x2000000000000000) 
    + SIGN(@input & 0x1000000000000000) 
    + SIGN(@input & 0x0800000000000000) 
    + SIGN(@input & 0x0400000000000000) 
    + SIGN(@input & 0x0200000000000000) 
    + SIGN(@input & 0x0100000000000000) 
    + SIGN(@input & 0x0080000000000000) 
    + SIGN(@input & 0x0040000000000000) 
    + SIGN(@input & 0x0020000000000000) 
    + SIGN(@input & 0x0010000000000000) 
    + SIGN(@input & 0x0008000000000000) 
    + SIGN(@input & 0x0004000000000000) 
    + SIGN(@input & 0x0002000000000000) 
    + SIGN(@input & 0x0001000000000000) 
    + SIGN(@input & 0x0000800000000000) 
    + SIGN(@input & 0x0000400000000000) 
    + SIGN(@input & 0x0000200000000000) 
    + SIGN(@input & 0x0000100000000000) 
    + SIGN(@input & 0x0000080000000000) 
    + SIGN(@input & 0x0000040000000000) 
    + SIGN(@input & 0x0000020000000000) 
    + SIGN(@input & 0x0000010000000000) 
    + SIGN(@input & 0x0000008000000000) 
    + SIGN(@input & 0x0000004000000000) 
    + SIGN(@input & 0x0000002000000000) 
    + SIGN(@input & 0x0000001000000000) 
    + SIGN(@input & 0x0000000800000000) 
    + SIGN(@input & 0x0000000400000000) 
    + SIGN(@input & 0x0000000200000000) 
    + SIGN(@input & 0x0000000100000000) 
    + SIGN(@input & 0x0000000080000000) 
    + SIGN(@input & 0x0000000040000000) 
    + SIGN(@input & 0x0000000020000000) 
    + SIGN(@input & 0x0000000010000000) 
    + SIGN(@input & 0x0000000008000000) 
    + SIGN(@input & 0x0000000004000000) 
    + SIGN(@input & 0x0000000002000000) 
    + SIGN(@input & 0x0000000001000000) 
    + SIGN(@input & 0x0000000000800000) 
    + SIGN(@input & 0x0000000000400000) 
    + SIGN(@input & 0x0000000000200000) 
    + SIGN(@input & 0x0000000000100000) 
    + SIGN(@input & 0x0000000000080000) 
    + SIGN(@input & 0x0000000000040000) 
    + SIGN(@input & 0x0000000000020000) 
    + SIGN(@input & 0x0000000000010000) 
    + SIGN(@input & 0x0000000000008000) 
    + SIGN(@input & 0x0000000000004000) 
    + SIGN(@input & 0x0000000000002000) 
    + SIGN(@input & 0x0000000000001000) 
    + SIGN(@input & 0x0000000000000800) 
    + SIGN(@input & 0x0000000000000400) 
    + SIGN(@input & 0x0000000000000200) 
    + SIGN(@input & 0x0000000000000100) 
    + SIGN(@input & 0x0000000000000080) 
    + SIGN(@input & 0x0000000000000040) 
    + SIGN(@input & 0x0000000000000020) 
    + SIGN(@input & 0x0000000000000010) 
    + SIGN(@input & 0x0000000000000008) 
    + SIGN(@input & 0x0000000000000004) 
    + SIGN(@input & 0x0000000000000002) 
    + SIGN(@input & 0x0000000000000001)
    )