I have a dataset where I need to randomly select x number of rows that meet certain criteria, x number of rows that meet other criteria, etc., and outputs the results all in one table. I've been using the following:
SELECT * FROM (SELECT TOP 1000 * FROM dbo.[Client List]
WHERE Source = 'HC' AND Service = 'Service1' AND Provider = 'ProviderName1'
ORDER BY NEWID()) Table1
UNION ALL
SELECT * FROM (SELECT TOP 500 * FROM dbo.[Client List]
WHERE Source = 'HC' AND Service = 'Service2' AND Provider = 'ProviderName2'
ORDER BY NEWID()) Table2
UNION ALL
SELECT * FROM (SELECT TOP 2200 * FROM dbo.[Client List]
WHERE Source = 'BA' AND Service = 'Service3' AND Provider = 'ProviderName3'
ORDER BY NEWID()) Table3
This works, but there's one problem: It's allowing duplicates of the client identifier (dbo.[Client List].[ClientID] to be selected. All of my results must be unique clients.
In other words, it must do the first random select, then do the second random select without being able to select any of the ClientIDs selected in the first select, and so on. (Yes, I realize that this is not technically "random".) Is there a way I can add some sort of code in the WHERE statement of each subsequent SELECT, or do I need to rethink the whole structure of the code? Thanks!
How about something like this?
with Service1 as
(
SELECT TOP 1000 *
FROM dbo.[Client List]
WHERE Source = 'HC'
AND Service = 'Service1'
AND Provider = 'ProviderName1'
ORDER BY NEWID()
)
, Service2 as
(
SELECT TOP 500 *
FROM dbo.[Client List]
WHERE Source = 'HC'
AND Service = 'Service2'
AND Provider = 'ProviderName2'
AND ClientID not in (select ClientID from Service1)
ORDER BY NEWID()
)
, Service3 as
(
SELECT TOP 2200 *
FROM dbo.[Client List]
WHERE Source = 'BA'
AND Service = 'Service3'
AND Provider = 'ProviderName3'
AND ClientID not in (select ClientID from Service1)
AND ClientID not in (select ClientID from Service2)
ORDER BY NEWID()
)
select *
from Service1
union all
select *
from Service2
union all
select *
from Service3