I'm thinking of using SimpleCache to store Python dataframes in a DASH app. Apparently it just uses a Python dictionary. Is it possible to get a partially-written dataframe out of the SimpleCache? That's my big concern.
what other issues could I run into due to threading?
what does “not threadsafe” mean?
When you deploy your application with a WSGI server like gunicorn
, you'll probably have multiple workers. Because SimpleCache
is an in-memory cache and each worker has its own allocated memory, a cache key set by one worker won't be available to another worker. This can lead to strange behaviour if not accounted for, as separate requests to your app are handled by different workers. This won't be apparent when running with the dev server, because there's only one process.
The workaround for this is to use a cache backend like Redis (see below). This means all workers are looking at the same cache.
Is it possible to get a partially-written dataframe out of the SimpleCache?
You should probably test your own implementation of this, but here's a really basic example:
>>> import pandas as pd
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2
0 1 3
1 2 4
Note that as of werkzeug 1.0 SimpleCache was moved out to cachelib. So after doing pip install cachelib
you can add df
to the cache and extract it.
>>> from cachelib import SimpleCache
>>> cache = SimpleCache()
>>> cache.add('somekey', df)
True
>>> cache.get('somekey')
col1 col2
0 1 3
1 2 4
To make this work with your redis server it should be as simple as changing the first two lines out to:
from cachelib import RedisCache
cache = RedisCache(host='your-redis-server')
This means a minimal code change in production allows you to still use SimpleCache
when developing. (add
and get
are methods of both classes).