nosqlamazon-dynamodbamazon-dynamodb-data-modeling

DynamoDB Model Chronological Shipment Update Data


I recently just started learning about DynamoDB single table design. Now, I am trying to model Shipment Update data that has the following properties:

Access patterns:

  1. get all shipments of an account displaying the last updated status ordered by eta in an ascending order
  2. for a given shipment, get the chronological updates

enter image description here

enter image description here

I am having a difficulty trying to resolve the 2 access patterns mentioned above. If, per se, I only have 1 record per shipment, then I can just update the sort key for the shipment update items to be shpm#55abc and the retrieval of all shipments for a given account by eta is straight forward, which is via the gsi accountEta.

How do I resolve this to get the access patterns I need? Should I consider having a separate table for the shipment update audit, i.e. to store just the shipment updates? So that when I need access pattern #2, then I query this audit table by the shipment id to get all the chronological updates. But, I feel like this defeats the purpose of the single table design.


Solution

  • A single-table design is a good fit for these access patterns. Use overloadable, generic key names like PK and SK. Here is one approach*:

    Shipments have a "current" record. Add a global secondary index (GSI1) to create an alternate Primary Key for querying by account in ETA order (pattern #1). All changes to the shipment are executed as updates to this "current" record.

    # shipment "current" record
    PK             SK                                 GSI1PK            GSI1SK
    shpmt#55abc    x_current                          account#123       x_eta#2022-07-01
    

    Next, enable DynamoDB Streams on the table to capture shipment changes. Each time a "current" record is updated, the Lambda backing the Stream writes the OLD_IMAGE to the table as a change control record. This enables pattern #2 by shipment and account.

    # shipment update record
    PK             SK                                 GSI1PK           GSI1SK
    shpmt#55abc    update#2022-06-28T06:10:33.247Z    account#123      update#2022-06-28T06:10:33.247Z
    

    One virtue of this approach is that a single query operation can retrieve both the current shipment record and its full/partial change history in reverse order. This is the reason for the x_ prefixes on the current record's keys. A query with a key expression of PK = shpmt#55abc AND SK >= "update", DESC sorting with ScanIndexForward=False and a limit of 2 returns the current record (x_current) and the latest update record.

    * Whether this is a good solution for you also depends on expected read/write volumes.