I've spent my entire career working with denormalized relational databases. I am having a hard time un-learning all of that in order to implement a single-table design that can handle a couple specific access patterns on an "App Store"-like personal project.
Here's a quick ERD. There is an App model identified by a platform (iOS, Android) and bundle identifier along with a Defaults map that is used when creating new versions. Each App can have 0 to many Versions which are identified by a version number (which is a sequential numerical value and is unique within the context of an App). A version has an IsReleased attribute along with several others (like Name, Release Notes, Binary Path, etc).
Access Patterns
I'm having trouble with 1 though 4, this table is where I was headed. I'm having a hard time coming with a GSIs that will give me the all of the app items with a single version by sort order.
pk | sk | Defaults | App Name | Version | IsReleased | Other Attributes |
---|---|---|---|---|---|---|
app_ios_com.app.one |
defaults |
{ ... json ... } |
||||
app_ios_com.app.one |
version_1 |
App One | 1 | 1 | ||
app_ios_com.app.one |
version_2 |
App One | 2 | 1 | ||
app_ios_com.app.one |
version_3 |
App One | 3 | 1 | ||
app_ios_com.app.two |
defaults |
{ ... json ... } |
||||
app_ios_com.app.two |
version_1 |
App Two | 1 | 1 | ||
app_ios_com.app.two |
version_2 |
App Two | 2 | 0 | ||
app_ios_com.app.two |
version_3 |
App Two | 3 | 0 |
For example, for access pattern 1, I want:
pk | sk | Defaults | App Name | Version | IsReleased | Other Attributes |
---|---|---|---|---|---|---|
app_ios_com.app.one |
version_3 |
App One | 3 | 1 | ||
app_ios_com.app.two |
version_3 |
App Two | 3 | 0 |
For example, for access pattern 3, I would want:
pk | sk | Defaults | App Name | Version | IsReleased | Other Attributes |
---|---|---|---|---|---|---|
app_ios_com.app.one |
version_2 |
App One | 3 | 1 | ||
app_ios_com.app.two |
version_1 |
App Two | 1 | 1 |
Some data constraints that I have to keep in mind:
I feel like the solution is right in front of me, but I can't put my finger on it.
TLDR; The solution that springs to mind is a leaderboard pattern to cache the latest app versions in separate record(s). Whenever a new version is added, DynamoDB Streams sends the change as an event to lambda, which then updates the denormalised Latest records.
Note: One piece of information was missing from your excellent write-up: how often do you need to perform the latest
queries? If not very often, then "scan-and-done" will be OK for your current volumes. If the answer is 1k latest
queries per minute, then it's a different story. The good news is that your basic table design is solid. Latest
query optimisation can be implemented incrementally when performance/cost problems arise, without messing with the table design.
We will keep denormalised copie(s) of the latest versions, another sinful-sounding DynamoDB pattern. The Stream-triggered lambda will update those records using the update API when a version is added or changes release status. How to store the latest version info? We have several options:
latest
data in a singleton record with map attributes {app1: {latest version copy}, app2: ...}
. You can put more logic into the records to handle the isReleased
items, or simply fetch the record and filter in your backend.app_id
. The records
have the same info as in #1.GSI1PK=Latest#Released AND begins_with(GSI1SI, "IOS")
GSI1PK GSI1SK
Latest app_ios_com.app.one
Latest IOS#app_ios_com.app.one
Latest#Released app_ios_com.app.one
Latest#Released IOS#app_ios_com.app.one
Note: If you have high query volumes and low cardinality, hot partitions may be a problem for these "leaderboard" type deormalised patterns. If this becomes a problem, you can address it by keeping multiple copies of each "latest" record, e.g. have X copies that are queried randomly latest-copy1
, latest-copy2
, latest-copy3
. Amazon calls this pattern sharding using calculated suffixes.