sqlsql-servert-sqltemp-tablesnewid

SQL - Returning random rows from a temporary table using NEWID()


I am trying to insert data from a source table in to a temporary table using the NEWID() function so that I get a (fairly) random selection of lines from my source table.

Looking at the below code, I insert the data I need into the temp table #x and at Point 1 where I select from #x, it returns the data in a random order.

However, at Point 2 WHERE I am narrowing down the data from #x (I accumulate lines until a certain quantity is reached) the SELECT no longer returns random rows - it returns rows in a sequential order from the start of the table.

DROP TABLE IF EXISTS #x
CREATE TABLE #x (Id INT, Commodity VARCHAR(3), Quantity FLOAT, RowNum INT, TotalQuantity FLOAT)
INSERT INTO #x (id,commodity,quantity,rownum,totalquantity)
  SELECT
    i.id, i.commodity, i.quantity, ROW_NUMBER() OVER (ORDER BY i.id), SUM(i.quantity) OVER (ORDER BY i.id RANGE UNBOUNDED PRECEDING)
  FROM inventory i
  WHERE .........
        .........
  ORDER BY NEWID()


SELECT * FROM #x   -------- **POINT 1**


DECLARE @y INT = (SELECT MIN(rownum) AS minrownum FROM #x WHERE totalquantity >= @tonnes)


SELECT #x.id, #x.commodity, #x.quantity, #x.rownum, #x.totalquantity FROM #x
WHERE #x.rownum <= @y
ORDER BY NEWID()         -------- **POINT 2**

Any ideas on what I am missing?

Thanks.


Solution

  • LOL. Your row number is deterministic. There may be better ways to do what you want, but you can fix the above code by randomizing the row number:

    ROW_NUMBER() OVER (ORDER BY newid())
    

    The outer ORDER BY is probably unnecessary.

    Your query, though is quite confusing. It is selecting the first N rows (by id) that sum up to the total quantity. This makes a lot of sense. I'm not sure what all the randomization is for.

    EDIT:

    If you need to get random rows until a certain number is reached, you can do:

    SELECT i.*
    FROM (SELECT i.*, SUM(i.quantity) OVER (ORDER BY NEWID()) as cume_quantity
          FROM inventory i
          WHERE .........
                .........
         ) i
    WHERE cume_quantity - quantity < @tonnes;
    

    You don't need a temporary table. You don't need additional queries.