We have a table in Dynamo DB, attributes to be saved are as below.
We are using API gateway and integrating it with Dynamo DB directly. We want to get the latest N records after filtering.
We thought of using Scan with limit of N, but this is not working as "limit" limits the scanning before filtering and gives us inconsistent result. We have created GSI on CreatedDate for the below post request to work.
{
"TableName": "<TABLE NAME>",
"Limit": $input.params('pageSize'),
"FilterExpression": "CreatedDate > :v1",
"ExpressionAttributeValues": {
":v1": {"S": "2024-05-03T06:32:22"}
},
"ReturnConsumedCapacity": "TOTAL",
}
We also want to sort the result set in desc order of CreatedDate, latest first.
Do you think it is possible, if not do we need to consider DB change?
For this, I would add a single value to your items (assuming your write throughput doesn't exceed 1000WCU per second).
ProcessId | CreatedDate | Filename | GSI1PK | Random |
---|---|---|---|---|
123 | 2024-03-02T20:00:000Z | myfile100 | 1 | some values |
001 | 2024-02-01T20:00:000Z | myfile202 | 1 | some values |
928 | 2024-03-04T20:00:000Z | myfile412 | 1 | some values |
102 | 2024-04-05T20:00:000Z | myfile339 | 1 | some values |
Now you create an index on GSI1PK
as partition key and CreatedDate
as sort key:
GSI1PK | CreatedDate | Filename | ProcessId | Random |
---|---|---|---|---|
1 | 2024-02-01T20:00:000Z | myfile202 | 001 | some values |
1 | 2024-03-02T20:00:000Z | myfile100 | 123 | some values |
1 | 2024-03-04T20:00:000Z | myfile412 | 928 | some values |
1 | 2024-04-05T20:00:000Z | myfile339 | 102 | some values |
Notice how all of your items are now stored in ascending order of CreatedDate
. Now you issue a Query
and have ScanIndexForward=False
which will read the latest items first (DESC).
{
"TableName": "<TABLE NAME>",
"IndexName": <INDEX NAME>",
"Limit": $input.params('pageSize'),
"KeyConditionExpression": "GSI1PK = :v1",
"ExpressionAttributeValues": {
":v1": {"S": "1"}
},
"ReturnConsumedCapacity": "TOTAL",
"ScanIndexForward": False
}
Of course it doesn't have to be an index, you could use my index schema here as your base table, but that ultimately depends on your specific data access requirements.
More info on this blog post