amazon-dynamodb aws-appsync single-table-inheritance

How can I make a efficient query to get all sorted by LastPostedDate threads from more than one group?

I am using Single Table in DynamoDB and Group and Thread are stored in one table. One-to-many-relation. All threads are sorted by lastPostedAt. lastPostedAt is updated if a new comment or a new suggestion is posted in Thread. PK=Group#{id} SK=Thread#{id}. it is no problem to get all threads in a group. GIS PK=GroupID SK=Thread#lastPostedAt

Access Pattern: Get all threads from more than one group. all threads are sorted by lastPostedAt.

How should I model for that access pattern with GIS?

Is it ok with large partitation key and Filterexpression? GIS PK=Thread (without id) SK=lastPostedAt Filterexpression=GroupID In ('Group1','Group2','Group3')

With filter expression, there can be difficulties in pagination if 'Group4' has a large number of elements compared to 'Group1', 'Group2' and 'Group3'.

If a group has a much larger number of elements than the other groups, this can lead to imbalances in pagination. If, for example, 'Group4' contains 1000 elements and 'Group1', 'Group2' and 'Group3' contain only 100 elements, it can be difficult to achieve a even distribution of the pagination across all groups.

schema.graphql

type Group
{
    id: ID!
    users: [User]
    admins: [User]
    title: String
}
type Thread
{
    id: ID!
    groupId: ID!
    term: String
    owner: user
    lastPostedAt: AWSDateTime
}

Solution

I think the right answer here is that DynamoDB is not a good choice for this use case. Consider using a different DB technology or a secondary storage that builds your sorted view.

That being said, using a static hash key “Thread” is a possible option but would introduce scaling limits, as a single partition in DybamoDB can generally support only up to 3000 RSU and 1000 WSU per second. If you are sure that you’ll never hit that limit then go ahead.

Keep in mind that filtering on groups will not reduce amount of records that will be processed by DynamoDB during query. It only impacts returned result, but you will still be charged for all records (and performance will also suffer)