sql-serverdatabasesql-server-2008-r2message-queueservice-broker

Reliable asynchronous processing in SQL Server


Some of the services are provided to our customers by a 3rd party. The data which is created on their remote servers is replicated to an on-premises SQL server. I need to perform some work on that 3rd party server which database is not directly accessible to me.They expose a set of APIs for that purpose. The work is performed on a linked SQL server by a SQL Server Agent job.

Business scenario : customers can receive "badges" .A badge can be given to a customer by calling the UpdateCustomerBadgeInfo web method on a 3rd party server.

So a typical requirement for an automated task would look like this:

"Find all customers who logged in more than 50 times during theday, give them the [has-no-life] badge and send them an SMS notification"

The algorithm would be:

- Select all the matching accounts into a #TempTable  
 for each customer record:
 - Call UpdateCustomerBadgeInfo() method (via CLR)
 - If successfully updated badge info-> Enqueue SMS message (queue table)
 - Log successful actions (so that the record will not be picked up next time)

The biggest problem with the way it works now is that it takes a lot of time to process large datasets in a WHILE loop.

So the 3rd party provider created a solution to perform batch updates of the customer data. They created a table on the on-premises SQL server to which batch update requests are submitted and later picked up by their service for validation and processing.

The question is :

How the above algorithm should be changed to fit into this asynchronous model?


Solution

  • This answer is valid only if I understood the situation correctly:

    Generally, I don't see any significant changes to the algorithm, but I will try to explain what would I do in this case.

    1. Select all the matching accounts into a #TempTable

      This may not be neccessary because you already have the table to stuff your requests in - 3rd party table. Only problem would be synchronizing requests, but for this to analyze you have to provide more details (multiple requests for the same customer allowed? protection of re-issuing the same request?)

    2. for each customer record...

      This should be the only change in your implementation. It now has the meaning - for each customer record that is asynchronously processed on 3rd party side. Of course, your 3rd party must give you some clue that they really did process your customer requeset, or you have no idea what to work with. So, when they validate and proces the data, they can provide e.g. nullable columns 'success_time' and 'error_time' to leave you message what has been done and when. If there is success, you continue with processing. If not, you can probably do something about that as well.

      But how to react when you get async information back (e.g. sucess_time IS NOT NULL)? Well, there are multiple ways to do that. Personally I try to avoid triggers because they can make your life complicated (their visibility sucks, can cause problems with replication, can cause problems with transactions...) I use them if I really need first-class immediate responsiveness. Another possibility is using async queues with custom activation, which means Service Broker. However, a lot of people avoid using SB technology - it's different than the rest of SQL server, it has its speciffics, debugging is not so easy as with plain old SQL statements etc etc. Aother possibility would be batch processing async responses on your side using agent job. Since you are already using a job, you should be fine with it. Basically, the table should act as a synchronization point - you fill your requests (INSERT), 3rd party processes them (SELECT). After requests get processed they mark them as such (UPDATE success_time or error_time) and at the end you process that response (SELECT) using your agent job task. And your processing includes SMS message and logging, maybe even DELETING from 3rd party table.

      Another thing to mention is that you need synchronization methods here. First, don't do anything without transactions, or you may end up processing ghost responses and/or skipping valid waiting responses. Second, when you SELECT responses (rows that are procesed on 3rd party side), you could get some improvent using READPAST hint (skip what is locked). However, if you need to update/delete from your 3rd party table after processing response, you may use SELECT with UPDLOCK to block another side of temperig with the data between your INSERT and UPDATE. Or you don't use any locking hints if you are not completely sure what goes on with the table in question.

    Hope it helps.