I'm looking to use a distributed cache in python. I have a fastApi application and wish that every instance have access to the same data as our load balancer may route the incoming requests differently. The problem is that I'm storing / editing information about a relatively big data set from a arrow feather file and processing it with Vaex. The feather file automaticaly loads the correct types for the data. The data structure I need to store will use a user id as a key and the value will be a large array of arrays of numbers. I've looked at memcache and redis as possible caching solutions, but both seem to store entries as strings / simple values. I'm looking to avoid parsing strings and extra processing on a large amount data. Is there a distributed caching stategy that will let me persist types?
One solution we came up with is to store the data in mutliple feather files in a directory that is accessible to all instances of the app but this seems to be messy as you would need to clean up / delete the files after each session.
Redis 'strings' are actually able to store arbitrary binary data, it isn't limited to actual strings. From https://redis.io/topics/data-types:
Redis Strings are binary safe, this means that a Redis string can contain any kind of data, for instance a JPEG image or a serialized Ruby object. A String value can be at max 512 Megabytes in length.
Another option is to use Flatbuffers, which is a serialisation protocol specifically designed to allow reading/writing serialised objects without expensive deserialisation.
Although I would suggest reconsidering storing large, complex data structures as cache values. The drawback is that any change will lead to having to rewrite the entire thing in cache which can get expensive, so consider breaking it up into smaller k/v pairs if possible. You could use Redis Hash data type to make this easier to implement.