Let say i have a array of memcache server, the memcache client will make sure the the cache entry is only on a single memcache server and all client will always ask that server for the cache entry... right ?
Now Consider two scenarios:
[1] web-server's are getting lots of different request(different urls) then the cache entry will be distributed among the memcache server and request will fan out to memcache cluster.
In this case the memcache strategy to keep single cache entry on a single server works.
[2] web-server's are getting lots of request for the same resource then all request from the web-server will land on a single memcache server which is not desired.
What i am looking for is the distributed cache in which:
[1] Each web-server can specify which cache node to use to cache stuff.
[2] If any web-server invalidate a cache then the cache server should invalidate it from all caching nodes.
Can memcache fulfill this usecase ?
PS: I dont have ton of resouces to cache , but i have small number of resource with a lots of traffic asking for a single resource at once.
Memcache is a great distributed cache. To understand where the value is stored, it's a good idea to think of the memcache cluster as a hashmap, with each memcached process being precisely one pigeon hole in the hashmap (of course each memcached is also an 'inner' hashmap, but that's not important for this point). For example, the memcache client determines the memcache node using this pseudocode:
index = hash(key) mod len(servers)
value = servers[index].get(key)
This is how the client can always find the correct server. It also highlights how important the hash function is, and how keys are generated - a bad hash function might not uniformly distribute keys over the different servers…. The default hash function should work well in almost any practical situation, though.
Now you bring up in issue [2] the condition where the requests for resources are non-random, specifically favouring one or a few servers. If this is the case, it is true that the respective nodes are probably going to get a lot more requests, but this is relative. In my experience, memcache will be able to handle a vastly higher number of requests per second than your web server. It easily handles 100's of thousands of requests per second on old hardware. So, unless you have 10-100x more web servers than memcache servers, you are unlikely to have issues. Even then, you could probably resolve the issue by upgrading the individual nodes to have more CPUs or more powerful CPUs.
But let us assume the worst case - you can still achieve this with memcache by:
I personally have reservations about this - you are, by specification, disabling the distributed aspect of your cache, and the distribution is a key feature and benefit of the service. Also, your application code would start to need to know about the individual cache servers to be able to treat each differently which is undesirable architecturally and introduces a large number of new configuration points.
The idea of any distributed cache is to remove the ownership of the location(*) from the client. Because of this, distributed caches and DB do not allow the client to specify the server where the data is written.
In summary, unless your system is expecting 100,000k or more requests per second, it's doubtful that you will this specific problem in practice. If you do, scale the hardware. If that doesn't work, then you're going to be writing your own distribution logic, duplication, flushing and management layer over memcache. And I'd only do that if really, really necessary. There's an old saying in software development:
There are only two hard things in Computer Science: cache invalidation and naming things.
--Phil Karlton
(*) Some distributed caches duplicate entries to improve performance and (additionally) resilience if a server fails, so data may be on multiple servers at the same time