amazon-web-servicesamazon-dynamodbaws-api-gatewayvtl

How to get recent entries from Dynamo DB


We have a table in Dynamo DB, attributes to be saved are as below.

We are using API gateway and integrating it with Dynamo DB directly. We want to get the latest N records after filtering.

We thought of using Scan with limit of N, but this is not working as "limit" limits the scanning before filtering and gives us inconsistent result. We have created GSI on CreatedDate for the below post request to work.

{
  "TableName": "<TABLE NAME>",
  "Limit": $input.params('pageSize'),
  "FilterExpression": "CreatedDate > :v1",
  "ExpressionAttributeValues": {
      ":v1": {"S": "2024-05-03T06:32:22"}
  },
  "ReturnConsumedCapacity": "TOTAL",

}

We also want to sort the result set in desc order of CreatedDate, latest first.

Do you think it is possible, if not do we need to consider DB change?


Solution

  • For this, I would add a single value to your items (assuming your write throughput doesn't exceed 1000WCU per second).

    ProcessId CreatedDate Filename GSI1PK Random
    123 2024-03-02T20:00:000Z myfile100 1 some values
    001 2024-02-01T20:00:000Z myfile202 1 some values
    928 2024-03-04T20:00:000Z myfile412 1 some values
    102 2024-04-05T20:00:000Z myfile339 1 some values

    Now you create an index on GSI1PK as partition key and CreatedDate as sort key:

    GSI1PK CreatedDate Filename ProcessId Random
    1 2024-02-01T20:00:000Z myfile202 001 some values
    1 2024-03-02T20:00:000Z myfile100 123 some values
    1 2024-03-04T20:00:000Z myfile412 928 some values
    1 2024-04-05T20:00:000Z myfile339 102 some values

    Notice how all of your items are now stored in ascending order of CreatedDate. Now you issue a Query and have ScanIndexForward=False which will read the latest items first (DESC).

    {
      "TableName": "<TABLE NAME>",
      "IndexName": <INDEX NAME>",
      "Limit": $input.params('pageSize'),
      "KeyConditionExpression": "GSI1PK = :v1",
      "ExpressionAttributeValues": {
          ":v1": {"S": "1"}
      },
      "ReturnConsumedCapacity": "TOTAL",
      "ScanIndexForward": False
    }
    

    Of course it doesn't have to be an index, you could use my index schema here as your base table, but that ultimately depends on your specific data access requirements.

    More info on this blog post