I have a node js backend service that currently fetches data directly from the postgres database, and I am working to introduce a caching middle layer.
This is my first time using Redis and I am facing a few challenges to how I can use it efficiently for my complex queries. My confusion here is two things, how do we map from a query to a key in the Redis cache. How should update the redis cache once a update/create functions are invoked.
I have e.g. the following function with joining multiple tables (only shown two here for brevity)
async findAll(query: PaginateQuery, user: User): Promise<Paginated<any>> {
const limit = query.limit && query.limit < MAX_LIMIT ? query.limit : MAX_LIMIT
const page = query.page ?? DEFAULT_PAGE
const skip = (page - 1) * limit
const queryBuilder = getConnection().getRepository(tableEntryX).createQueryBuilder('tableX').leftJoin('tableX.fk1', 'fk1').addSelect(['fk1.id', 'fk1.gender']).leftJoin('tableX.fk2', 'fk2').addSelect(['fk2.id', 'fk2.weigth'])
if (query.filter?.createdAt) {
/// udpate query based on this filter
}
// filter 2
// filter 3
// fetch
const [data, count] = await Promise.all([queryBuilder.skip(skip).take(limit).getMany(), queryBuilderCount.getCount()])
const res = EMPTY_PAGINATED_DATA
res.data = data
res.meta.itemsPerPage = limit
res.meta.currentPage = page
res.meta.totalItems = count
res.meta.totalPages = Math.ceil(count / limit)
res.meta.sortBy = sorting
return res
}
The same function checks for a number of filters and based on those filters we alter the query with the and/orWhere functions.
How should we cache this, while also taking into account that saving a new entry in the table might/might not imply a change in the cached data.
So my initial though was to simply to hash the query and cache this. This might be a fairly simple solution and probably not too bad with memory consumption (acknowledning that such approach might lead to duplicated cached entries since some queries could partially overlap each other)
However, I find that the more pressing problem with this approach is when an update / new entry takes place. Here what should we do with all our caches. It seems a waste to simply evict all caches and start from scratch. Especially, given that the update/create of entries happens quite regularly.
Any cache is always a trade-off. Thus, the correct answer will depend on many factors (how many reads, how many writes, how many params, which indexes, etc.; you got it). Let's consider a couple of the simplest approaches.
You can store a copy of full table query data in a cache (Redis, in your case) and then implement filtration in your code, avoiding SQL-based filtration. This approach will move the load from your database to a cache. Cache Invalidation is straightforward in this case--you just update your cache when you change data.
Another one is to store table rows in a cache, where the key is the ID and the value is the raw whole row (object). Then, you can filter rows on the SQL side, but instead of full rows, only a list of filtered/sorted IDs is returned. After that, you can enrich IDs with data from a cache. This approach would save your network bandwidth or allow you to write covered indexes, furthermore save your disk IO.
My personal recommendation is to analyze your data and request. You may not need a complex cache solution, and caching a query with default parameters can save 95% of resources.