amazon-dynamodbnosql

How does DynamoDB's LIMIT parameter work exactly?


Extending from DynamoDB Limit on query.

I have a use case where the need is to fetch the latest available exchange rate from table. I've considered the following setup:

Now for reading the latest available data, I was thinking about using this query:

aws dynamodb query \
  --table-name exchange_rates \
  --key-condition-expression "fx_pair = :partitionKeyVal" \
  --expression-attribute-values '{":partitionKeyVal":{"S":"SGD_CNY"}}' \
  --no-scan-index-forward \
  --limit 1 \
  --return-consumed-capacity TOTAL

But reading the AWS documentation, I'm a bit confused on how LIMIT works internally. According to their documentation, it specifies that:

The maximum number of items to evaluate (not necessarily the number of matching items)

Now for the query shown above, there isn't any filter expression.

  1. Will my query guarantee that I will receive the latest record every time? Is it efficient in terms of RCUs? (I did my own testing and noticed it was using 0.5 RCUs for a table with 200k records and the records scanned was also 1)

  2. If I modify my query to add any filter expressions, will I be getting the latest record that satisfies the filters?

  3. If limit of 5 (or any other number for that matter) is specified, how does dynamoDb decide which 5 records to evaluate/process? Is it based on the sort key?

According to me, this approach makes the writes easier as well (it would be an insert only table).

Another suggestion was to use version based inserts/updates as mentioned here - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html#bp-sort-keys-version-control Here, the V0 th record would always have the latest value and accessing it would be using partitionKey= and sortKey=.

This approach adds a bit more complexity when writing records and I think it would need more WCUs as well since there are inserts & updates happening. Am I wrong here?

Would like some insights on pros/cons of both approaches and my questions.


Solution

    1. Using a Limit is efficient, you'll only evaluate 1 item in this scenario and get returned 1 item should there be 1.

    2. Using a filter expression on top of when a Limit can cause issue, as you read 1 item before applying the filter. It does not guarantee you'll get 1 that matches the filter. You should build your filtering logic into the sort key.

    3. It'll read the 5 most recent items, as you're sorting in DESC order. Data is sorted on disk by the sort key, hence it's name.

    Only concern I see is that your sort key is in seconds, meaning 2 records with the same key written in the same second will overwrite one another. Consider using a ULID as sort key to ensure uniqueness.