amazon-dynamodb amazon-dynamodb-data-modeling

DynamoDB Table Design: How do I model a one to many relationship where I need all of the "one" items and one of "many" sorted by some attribute

I've spent my entire career working with denormalized relational databases. I am having a hard time un-learning all of that in order to implement a single-table design that can handle a couple specific access patterns on an "App Store"-like personal project.

Here's a quick ERD. There is an App model identified by a platform (iOS, Android) and bundle identifier along with a Defaults map that is used when creating new versions. Each App can have 0 to many Versions which are identified by a version number (which is a sequential numerical value and is unique within the context of an App). A version has an IsReleased attribute along with several others (like Name, Release Notes, Binary Path, etc).

Access Patterns

List the latest version of every app.
List the latest version of every app for a given platform.
List the latest version of every app where IsReleased is 1.
List the latest version of every app for a given platform where IsReleased is 1.
Get the latest version of a specific app.
Get the latest version of a specific app where IsReleased is 1.
Get all versions of a specific app.
Get all versions of a specific app where IsReleased is 1.
Get the Default attribute for a specific app.

I'm having trouble with 1 though 4, this table is where I was headed. I'm having a hard time coming with a GSIs that will give me the all of the app items with a single version by sort order.

pk	sk	Defaults	App Name	Version	IsReleased
`app_ios_com.app.one`	`defaults`	`{ ... json ... }`
`app_ios_com.app.one`	`version_1`		App One	1	1
`app_ios_com.app.one`	`version_2`		App One	2	1
`app_ios_com.app.one`	`version_3`		App One	3	1
`app_ios_com.app.two`	`defaults`	`{ ... json ... }`
`app_ios_com.app.two`	`version_1`		App Two	1	1
`app_ios_com.app.two`	`version_2`		App Two	2	0
`app_ios_com.app.two`	`version_3`		App Two	3	0

For example, for access pattern 1, I want:

pk	sk	Defaults	App Name	Version	IsReleased	Other Attributes
`app_ios_com.app.one`	`version_3`		App One	3	1
`app_ios_com.app.two`	`version_3`		App Two	3	0

For example, for access pattern 3, I would want:

pk	sk	Defaults	App Name	Version	IsReleased	Other Attributes
`app_ios_com.app.one`	`version_2`		App One	3	1
`app_ios_com.app.two`	`version_1`		App Two	1	1

Some data constraints that I have to keep in mind:

There are currently only 10 to 20 apps, but I need to be able to support hundreds
Most apps will have 100 to 200 versions with 20 to 30 released versions. The biggest app has 1000 versions of which 50 are released.
In the back-end, the IsReleased flag will typically be toggled from 0 to 1, but will occasionally be toggled from 0 to 1.
The average version item is approximately 2 KB.
The access pattern variations where IsReleased is 1 are more frequently used by a significant margin.

I feel like the solution is right in front of me, but I can't put my finger on it.

Solution

TLDR; The solution that springs to mind is a leaderboard pattern to cache the latest app versions in separate record(s). Whenever a new version is added, DynamoDB Streams sends the change as an event to lambda, which then updates the denormalised Latest records.

Note: One piece of information was missing from your excellent write-up: how often do you need to perform the latest queries? If not very often, then "scan-and-done" will be OK for your current volumes. If the answer is 1k latest queries per minute, then it's a different story. The good news is that your basic table design is solid. Latest query optimisation can be implemented incrementally when performance/cost problems arise, without messing with the table design.

Denormalising the Latest Versions

We will keep denormalised copie(s) of the latest versions, another sinful-sounding DynamoDB pattern. The Stream-triggered lambda will update those records using the update API when a version is added or changes release status. How to store the latest version info? We have several options:

Store all latest data in a singleton record with map attributes {app1: {latest version copy}, app2: ...}. You can put more logic into the records to handle the isReleased items, or simply fetch the record and filter in your backend.
Use a Global Secondary Index with one record per app. Each record has "latest" as the GSI1PK and GSI1SK of app_id. The records have the same info as in #1.
Use as GSI with multiple records per app. Something like this seems to work. For instance, query #4 would use GSI1PK=Latest#Released AND begins_with(GSI1SI, "IOS")

GSI1PK              GSI1SK
Latest              app_ios_com.app.one
Latest              IOS#app_ios_com.app.one
Latest#Released     app_ios_com.app.one
Latest#Released     IOS#app_ios_com.app.one

Note: If you have high query volumes and low cardinality, hot partitions may be a problem for these "leaderboard" type deormalised patterns. If this becomes a problem, you can address it by keeping multiple copies of each "latest" record, e.g. have X copies that are queried randomly latest-copy1, latest-copy2, latest-copy3. Amazon calls this pattern sharding using calculated suffixes.