I would like to find gaps in a sequence and summarize the findings in the following way:
number sequence: 2, 3, 4, 8, 9, 12, 13, 14, 15
missing numbers: 0, 1, 5, 6, 7, 10, 11
min number: 0 (always)
max number: max number of the sequence (15 in this example)
The summary should look like:
From | To | # of missing
00 | 01 | 2
05 | 07 | 3
10 | 11 | 2
I am using SQL server and in reality, the sequence will contain many more numbers (close to a million). I have found many scripts that find and list the missing numbers in the sequence, but I can't figure out how to summarize it in the desired way.
If it helps, the field is called BELNR
and the table is called BSEG
.
EDIT: with the help from the Gaps and Islands material, I have been able to find a solution (may not be the most optimal one, but I think it works):
with C as
(
select belnr, row_number() over(order by belnr) as rownum
from bseg
)
select cast(Cur.belnr as bigint) + 1 as [From], cast(nxt.belnr as bigint) - 1 as [To], (cast(nxt.belnr as bigint) - 1) - (cast(Cur.belnr as bigint) + 1) + 1 as [# of Missing]
from C as Cur
join C as Nxt
on Nxt.rownum = cast(Cur.rownum as int) +1
Where cast(nxt.belnr as bigint) - cast(Cur.belnr as bigint) > 1
This is called Islands and Gaps problem. Read more here:
https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/
The word ‘Gaps’ in the title refers to gaps in sequences of values. Islands are unbroken sequences delimited by gaps. The ‘Gaps and Islands’ problem is that of using SQL to rapidly detect the unbroken sequences, and the extent of the gaps between them in a column.