google-cloud-memorystore

Incident: All GCP Memorystore instances Flushed


Yesterday we noticed connectivity issues on all of our GCP Memorystore (Redis) instances.

In the web console, under /memorystore/redis/instances, the indicator was showing a "loading" icon and the title text was "Performing Maintenance".

A few minutes later, connectivity to the instances was restored, however all data was flushed/deleted! It seemed like the instances themselves had some sort of reboot, flushing all RAM.

We lost data in this incident and want to make sure to prevent this in the future. Is there any behavior from our side that might have triggered this? We saw this across multiple projects, so we thought it could be a google wide incident. However we didn't see anything about this online and GCP Memorystore status didn't report any incident/downtime.


Solution

  • I believe that what happened here, is that you are using a basic instance, on this instances data persistence isn't guaranteed and basic instances are better suited to be used as a cache instance, as mentioned at this link

    In this case you could use a standard tier, this is also stated by the documentation shared before: "The Standard Tier provides a highly available Redis instance with automatic failover and minimal data loss."

    Basically, what happened is, that when these instances are moved into another state, if it is a basic instance, the tables get flushed.