I have a bigint
column or variable. I want to convert it into its bitmap (base 2) representation as a varchar
and additionally to that count the 1s.
I need both components - not just the bit_count
equivalent
Example:
DECLARE @a bigint=-9151314442816847872
,@b bigint=71776119061217280
-9151314442816847872
converts to 1000000100000000000000000000000000000000000000000000000000000000
and there are 2 ones in the bitmap.
71776119061217280
converts to
0000000011111111000000000000000000000000000000000000000000000000
and there are 8 ones in it
The questions are:
There are millions of numbers to process in a very short time. Every millisecond counts!
Solution needs to be compatible with SQL Server 2016.
Here is my code.
My algorithm converts 1mln numbers in 18 seconds. This is slow if millions of numbers have to be converted.
BIT_COUNT()
is magnitude of orders faster, but it is available since SQL 2022. I need a solution compatible with 2016.
Can it be done faster?
DECLARE @a bigint=71776119061217280;
DECLARE @s char(64);
SET @s=
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
CONVERT(varchar,CAST(@a as binary(8)),2)
,'0','0000'),'1','0001'),'2','0010'),'3','0011'),'4','0100'),'5','0101'),'6','0110'),'7','0111')
,'8','1000'),'9','1001'),'A','1010'),'B','1011'),'C','1100'),'D','1101'),'E','1110'),'F','1111');
PRINT 'Bitmap: '+@s;
PRINT '1s count: '+CAST(64-LEN(REPLACE(@s,'1','')) as varchar);
For evaluating the bitmap string I do see considerable improvement with using the CONCAT
function to combine the 64 component characters vs the method given in the question.
In the below test in SQL Server 2022 the difference was 1.6 seconds vs 44 seconds on my machine (function definition at foot of answer)
SET STATISTICS TIME ON;
DECLARE @Base2String CHAR(64)
SELECT @Base2String = Base2String
FROM generate_series(1,1000000)
CROSS APPLY dbo.BigIntToBase2String (value)
SELECT @Base2String = Base2StringOrig
FROM generate_series(1,1000000)
CROSS APPLY dbo.BigIntToBase2String (value)
evaluating the bitmap count
Originally my claim here was that it was quicker to just re-evaluate this from scratch but I have changed my mind on this.
I have found that calculating the bitmap count from the string can be quicker as long as
The below tries various permutations with indicative execution times on my machine given in the comments
-- CPU time = 2579 ms, elapsed time = 2692 ms.
SELECT @BitCount = BitCount,
@Base2String = Base2String
FROM generate_series(1,1000000)
CROSS APPLY dbo.BigIntToBase2String (value)
PRINT 'Method 1'
-- CPU time = 3828 ms, elapsed time = 4066 ms.
-- Execution plan shows that Base2String is being calculated twice
SELECT @BitCount = 64-LEN(REPLACE(Base2String COLLATE Latin1_General_BIN2,'1','')),
@Base2String = Base2String
FROM
(
SELECT Base2String
FROM generate_series(1,1000000)
CROSS APPLY dbo.BigIntToBase2String (value)
) ca
PRINT 'Method 2'
-- CPU time = 3969 ms, elapsed time = 4125 ms.
-- only computes once but adds overhead of nested loops operator
SELECT @BitCount = 64-LEN(REPLACE(Base2String COLLATE Latin1_General_BIN2,'1','')),
@Base2String = Base2String
FROM
(
SELECT Base2String = Base2String
FROM generate_series(1,1000000)
OUTER APPLY dbo.BigIntToBase2String (value)
) ca
OPTION (MAXDOP 1)
Based on the above evaluating separately is the winner but I have since tried...
PRINT 'Method 3'
-- CPU time = 1609 ms, elapsed time = 1717 ms.
SELECT @BitCount = 64-LEN(REPLACE(Base2String COLLATE Latin1_General_BIN2,'1','')),
@Base2String = Base2String
FROM
(
SELECT Base2String = CONCAT(Base2String, CRYPT_GEN_RANDOM(0))
FROM generate_series(1,1000000)
CROSS APPLY dbo.BigIntToBase2String (value)
) ca
PRINT 'Method 3.1'
-- CPU time = 5782 ms, elapsed time = 6066 ms.
SELECT @BitCount = 64-LEN(REPLACE(Base2String,'1','')),
@Base2String = Base2String
FROM
(
SELECT Base2String = CONCAT(Base2String, CRYPT_GEN_RANDOM(0))
FROM generate_series(1,1000000)
CROSS APPLY dbo.BigIntToBase2String (value)
) ca
The chicanery with CRYPT_GEN_RANDOM
is based on the idea here and it does now only have a single evaluation of Base2String
without adding any nested loops to the plan, at least in my tests this did become the new winner though omitting the binary collate clause did incur a significant penalty
Function definition used in above tests
Credit to @Habo for optimizing the BitCount
part
CREATE FUNCTION [dbo].[BigIntToBase2String]
(
@input bigint
)
RETURNS TABLE
AS
RETURN
(
SELECT
/*Original method*/
Base2StringOrig = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
CONVERT(varchar,CAST(@input as binary(8)),2)
,'0','0000'),'1','0001'),'2','0010'),'3','0011'),'4','0100'),'5','0101'),'6','0110'),'7','0111')
,'8','1000'),'9','1001'),'A','1010'),'B','1011'),'C','1100'),'D','1101'),'E','1110'),'F','1111'),
/*New method*/
Base2String = CONCAT(
IIF(@input & 0x8000000000000000 = 0, '0','1'),
IIF(@input & 0x4000000000000000 = 0, '0','1'),
IIF(@input & 0x2000000000000000 = 0, '0','1'),
IIF(@input & 0x1000000000000000 = 0, '0','1'),
IIF(@input & 0x0800000000000000 = 0, '0','1'),
IIF(@input & 0x0400000000000000 = 0, '0','1'),
IIF(@input & 0x0200000000000000 = 0, '0','1'),
IIF(@input & 0x0100000000000000 = 0, '0','1'),
IIF(@input & 0x0080000000000000 = 0, '0','1'),
IIF(@input & 0x0040000000000000 = 0, '0','1'),
IIF(@input & 0x0020000000000000 = 0, '0','1'),
IIF(@input & 0x0010000000000000 = 0, '0','1'),
IIF(@input & 0x0008000000000000 = 0, '0','1'),
IIF(@input & 0x0004000000000000 = 0, '0','1'),
IIF(@input & 0x0002000000000000 = 0, '0','1'),
IIF(@input & 0x0001000000000000 = 0, '0','1'),
IIF(@input & 0x0000800000000000 = 0, '0','1'),
IIF(@input & 0x0000400000000000 = 0, '0','1'),
IIF(@input & 0x0000200000000000 = 0, '0','1'),
IIF(@input & 0x0000100000000000 = 0, '0','1'),
IIF(@input & 0x0000080000000000 = 0, '0','1'),
IIF(@input & 0x0000040000000000 = 0, '0','1'),
IIF(@input & 0x0000020000000000 = 0, '0','1'),
IIF(@input & 0x0000010000000000 = 0, '0','1'),
IIF(@input & 0x0000008000000000 = 0, '0','1'),
IIF(@input & 0x0000004000000000 = 0, '0','1'),
IIF(@input & 0x0000002000000000 = 0, '0','1'),
IIF(@input & 0x0000001000000000 = 0, '0','1'),
IIF(@input & 0x0000000800000000 = 0, '0','1'),
IIF(@input & 0x0000000400000000 = 0, '0','1'),
IIF(@input & 0x0000000200000000 = 0, '0','1'),
IIF(@input & 0x0000000100000000 = 0, '0','1'),
IIF(@input & 0x0000000080000000 = 0, '0','1'),
IIF(@input & 0x0000000040000000 = 0, '0','1'),
IIF(@input & 0x0000000020000000 = 0, '0','1'),
IIF(@input & 0x0000000010000000 = 0, '0','1'),
IIF(@input & 0x0000000008000000 = 0, '0','1'),
IIF(@input & 0x0000000004000000 = 0, '0','1'),
IIF(@input & 0x0000000002000000 = 0, '0','1'),
IIF(@input & 0x0000000001000000 = 0, '0','1'),
IIF(@input & 0x0000000000800000 = 0, '0','1'),
IIF(@input & 0x0000000000400000 = 0, '0','1'),
IIF(@input & 0x0000000000200000 = 0, '0','1'),
IIF(@input & 0x0000000000100000 = 0, '0','1'),
IIF(@input & 0x0000000000080000 = 0, '0','1'),
IIF(@input & 0x0000000000040000 = 0, '0','1'),
IIF(@input & 0x0000000000020000 = 0, '0','1'),
IIF(@input & 0x0000000000010000 = 0, '0','1'),
IIF(@input & 0x0000000000008000 = 0, '0','1'),
IIF(@input & 0x0000000000004000 = 0, '0','1'),
IIF(@input & 0x0000000000002000 = 0, '0','1'),
IIF(@input & 0x0000000000001000 = 0, '0','1'),
IIF(@input & 0x0000000000000800 = 0, '0','1'),
IIF(@input & 0x0000000000000400 = 0, '0','1'),
IIF(@input & 0x0000000000000200 = 0, '0','1'),
IIF(@input & 0x0000000000000100 = 0, '0','1'),
IIF(@input & 0x0000000000000080 = 0, '0','1'),
IIF(@input & 0x0000000000000040 = 0, '0','1'),
IIF(@input & 0x0000000000000020 = 0, '0','1'),
IIF(@input & 0x0000000000000010 = 0, '0','1'),
IIF(@input & 0x0000000000000008 = 0, '0','1'),
IIF(@input & 0x0000000000000004 = 0, '0','1'),
IIF(@input & 0x0000000000000002 = 0, '0','1'),
IIF(@input & 0x0000000000000001 = 0, '0','1')
),
BitCount =
- SIGN(@input & 0x8000000000000000)
+ SIGN(@input & 0x4000000000000000)
+ SIGN(@input & 0x2000000000000000)
+ SIGN(@input & 0x1000000000000000)
+ SIGN(@input & 0x0800000000000000)
+ SIGN(@input & 0x0400000000000000)
+ SIGN(@input & 0x0200000000000000)
+ SIGN(@input & 0x0100000000000000)
+ SIGN(@input & 0x0080000000000000)
+ SIGN(@input & 0x0040000000000000)
+ SIGN(@input & 0x0020000000000000)
+ SIGN(@input & 0x0010000000000000)
+ SIGN(@input & 0x0008000000000000)
+ SIGN(@input & 0x0004000000000000)
+ SIGN(@input & 0x0002000000000000)
+ SIGN(@input & 0x0001000000000000)
+ SIGN(@input & 0x0000800000000000)
+ SIGN(@input & 0x0000400000000000)
+ SIGN(@input & 0x0000200000000000)
+ SIGN(@input & 0x0000100000000000)
+ SIGN(@input & 0x0000080000000000)
+ SIGN(@input & 0x0000040000000000)
+ SIGN(@input & 0x0000020000000000)
+ SIGN(@input & 0x0000010000000000)
+ SIGN(@input & 0x0000008000000000)
+ SIGN(@input & 0x0000004000000000)
+ SIGN(@input & 0x0000002000000000)
+ SIGN(@input & 0x0000001000000000)
+ SIGN(@input & 0x0000000800000000)
+ SIGN(@input & 0x0000000400000000)
+ SIGN(@input & 0x0000000200000000)
+ SIGN(@input & 0x0000000100000000)
+ SIGN(@input & 0x0000000080000000)
+ SIGN(@input & 0x0000000040000000)
+ SIGN(@input & 0x0000000020000000)
+ SIGN(@input & 0x0000000010000000)
+ SIGN(@input & 0x0000000008000000)
+ SIGN(@input & 0x0000000004000000)
+ SIGN(@input & 0x0000000002000000)
+ SIGN(@input & 0x0000000001000000)
+ SIGN(@input & 0x0000000000800000)
+ SIGN(@input & 0x0000000000400000)
+ SIGN(@input & 0x0000000000200000)
+ SIGN(@input & 0x0000000000100000)
+ SIGN(@input & 0x0000000000080000)
+ SIGN(@input & 0x0000000000040000)
+ SIGN(@input & 0x0000000000020000)
+ SIGN(@input & 0x0000000000010000)
+ SIGN(@input & 0x0000000000008000)
+ SIGN(@input & 0x0000000000004000)
+ SIGN(@input & 0x0000000000002000)
+ SIGN(@input & 0x0000000000001000)
+ SIGN(@input & 0x0000000000000800)
+ SIGN(@input & 0x0000000000000400)
+ SIGN(@input & 0x0000000000000200)
+ SIGN(@input & 0x0000000000000100)
+ SIGN(@input & 0x0000000000000080)
+ SIGN(@input & 0x0000000000000040)
+ SIGN(@input & 0x0000000000000020)
+ SIGN(@input & 0x0000000000000010)
+ SIGN(@input & 0x0000000000000008)
+ SIGN(@input & 0x0000000000000004)
+ SIGN(@input & 0x0000000000000002)
+ SIGN(@input & 0x0000000000000001)
)