Recently I had to deal with a problem that I imagined would be pretty common: given a database table with a large (million+) number of rows to be processed, and various processors running in various machines / threads, how to safely allow each processor instance to get a chunk of work (say 100 items) without interfering with one another?
The reason I am getting a chunk at a time is for performance reasons - I don't want to go to the database for each item.
There are a few approaches - you could associate each processor a token, and have a SPROC that sets that token against the next [n] available items; perhaps something like:
(note - needs suitable isolation-level; perhaps serializable: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
)
(edited to fix TSQL)
UPDATE TOP (1000) WORK
SET [Owner] = @processor, Expiry = @expiry
OUTPUT INSERTED.Id -- etc
WHERE [Owner] IS NULL
You'd also want a timeout (@expiry
) on this, so that when a processor goes down you don't lose work. You'd also need a task to clear the owner on things that are past their Expiry
.