redis-clusterredissonelastic-cache

Elastic cache clustered one node CPU spikes on one node alone


Adding elastic cache Redis AWS into a java service. Elastic cache is configured to clustered mode. Client used - Redisson. Symptom There are times in the POC where the CPU spikes and only in a single nodes in the cluster. When reading this troubleshooting article from AWS https://aws.amazon.com/premiumsupport/knowledge-center/elasticache-redis-high-cpu-usage/ it suggested looking at new connections and I did find new connections spikes correlate to the cluster node's CPU spiking to 90+%. The thing is I don't control when redisson (at least have not added any redisson configuration for this) creates new connections.

  1. Why is it seeing some connection (maybe to support increase in throughput? ) ?
  2. What could be configured to avoid this CPU spike (likely due to new connections) which slows down all queries to the cluster?
  3. Am I in the right direction with new connections as the cause of CPU Spikes. Adding the graphs showing CPU spikes correlating to new connection AWS metric.

Solution

  • The question is very open-ended because an increase in connection can be due to multiple reasons. I suggest you check the following-

    1. If you have a hot shard problem i.e. one of your shards is getting more traffic than others. This could be due to a problem with your key resolving to same hash slots. Try storing SHA1 of your key instead the key in redis and see if it solves your problem. While fetching the value for a particular key, create it's SHA1 first and then make a call to redis.
    2. Try using lettuce instead of Redisson. This is because lettuce can adapt to topological changes to cluster better and manages the connection pool itself rather than giving it as a configurable option. Read this, this might help- https://aws.amazon.com/blogs/database/building-resiliency-at-scale-at-tinder-with-amazon-elasticache/?pg=ln&sec=c

    I think point 1 might solve your issue. Just a hunch.