I am trying to insert data from a source table in to a temporary table using the NEWID()
function so that I get a (fairly) random selection of lines from my source table.
Looking at the below code, I insert the data I need into the temp table #x
and at Point 1 where I select from #x
, it returns the data in a random order.
However, at Point 2 WHERE I am narrowing down the data from #x
(I accumulate lines until a certain quantity is reached) the SELECT no longer returns random rows - it returns rows in a sequential order from the start of the table.
DROP TABLE IF EXISTS #x
CREATE TABLE #x (Id INT, Commodity VARCHAR(3), Quantity FLOAT, RowNum INT, TotalQuantity FLOAT)
INSERT INTO #x (id,commodity,quantity,rownum,totalquantity)
SELECT
i.id, i.commodity, i.quantity, ROW_NUMBER() OVER (ORDER BY i.id), SUM(i.quantity) OVER (ORDER BY i.id RANGE UNBOUNDED PRECEDING)
FROM inventory i
WHERE .........
.........
ORDER BY NEWID()
SELECT * FROM #x -------- **POINT 1**
DECLARE @y INT = (SELECT MIN(rownum) AS minrownum FROM #x WHERE totalquantity >= @tonnes)
SELECT #x.id, #x.commodity, #x.quantity, #x.rownum, #x.totalquantity FROM #x
WHERE #x.rownum <= @y
ORDER BY NEWID() -------- **POINT 2**
Any ideas on what I am missing?
Thanks.
LOL. Your row number is deterministic. There may be better ways to do what you want, but you can fix the above code by randomizing the row number:
ROW_NUMBER() OVER (ORDER BY newid())
The outer ORDER BY
is probably unnecessary.
Your query, though is quite confusing. It is selecting the first N rows (by id) that sum up to the total quantity. This makes a lot of sense. I'm not sure what all the randomization is for.
EDIT:
If you need to get random rows until a certain number is reached, you can do:
SELECT i.*
FROM (SELECT i.*, SUM(i.quantity) OVER (ORDER BY NEWID()) as cume_quantity
FROM inventory i
WHERE .........
.........
) i
WHERE cume_quantity - quantity < @tonnes;
You don't need a temporary table. You don't need additional queries.