apirestconcurrencystateless

RESTful APIs must be stateless, but what about concurrency?


I'm curious how I solve the concurrency issue for a RESTful API. More specifically, I have a collection of objects that need manual examination and update, e.g. a number of rows that need a column updated by hand; however, if I open up the API to a number of clients, they will all be grabbing these items from the top down, so many users will be filling the column of the same row at the same time. I'd prefer to not have collisions, and the simple, stateful way is to just dump items into a queue on the service and pop them off as people request them.

What is the stateless version of this? Hash by IP address, or randomly grab rows based on id?

:: update ::

"Hrm, so it must simply be stateless from the perspective of the client?

That certainly makes a lot of sense. I was just reading an article (ibm.com/developerworks/webservices/library/ws-restful) about RESTful APIs, and after encountering the bit about paging, I was worried that my quite stateful queue was similar to incrementing by a page, but they're actually quite different as "next page" is relative on the client side, whereas "pop" is always stateless for the client: It doesn't matter what was popped before.

Thanks for clearing my head!" -Me


Solution

  • There are two basic approaches you can take:

    1. Go completely stateless, and adopt a "last request wins" strategy. As odd as it might sound, it's likely the cleanest solution in terms of predictability, scalability, code complexity and implementation on both client and server sides. There's also plenty of precedence for it: look at how sites like Google paginate through queries using a start=10 for page 2, start=20 for page 3, etc.

      You might find that the content changes within pages as you navigate back and forth between them, but so what? You're always getting the latest information, and Google can handle your requests on any of their many servers without having to find your session information to determine what your last query context was.

      The biggest advantage to this approach is the simplicity of your server's implementation. Each request can just pass right through to the data layer at the back-end, and it's absolutely ripe for caching at both the HTTP level (via E-Tags or Last-Modified headers) and the server side (using something like memcache, for example).

    2. Go stateful, and figure out a way to have your servers dole out some kind of per-client lock or token for each API "session". This will be like trying to fight the ocean's tide with a stick, because you'll end up failing and frustrated.

      How will you identify clients? Session keys? IP address? File descriptor for the socket they rode in on (good luck with that if you're using a transport like HTTP where the connection can be closed between requests...)? The details you choose for this will have to be persisted on the server side, or you'll have to use some nasty old sticky session feature on your app server (and if so, heaven help your client if the server they are using goes down mid-session).

      How will you handle API clients that disappear ungracefully? Will you timeout their session locks automatically by having a reaper thread clean up idle ones? That's more code, more complexity and more places for bugs to hide. What about API clients that come back from a long idle time and try to re-use an expired lock, how should client applications be built to handle that situation?

    I could go on, but hopefully you can see my point. Go with option 1, and go stateless. Otherwise you'll end up trying to track client state on the server side. And the only thing that should track a client's state is the client itself.