amazon-web-servicesaws-lambdaamazon-dynamodbthroughputcapacity

Dynamodb one bulk scan vs many single gets


Suppose I have a lambda function and as the event param I get about 50 primary ids that I have to look for inside a dynamodb table, what would be the better way to do it - 50 get queries each one by different primary id OR one scan and then comparing the scan primary ids results to the primary ids recieved as param?

I think 50 get query would be better on the performance side because if tomorrow I will have one million records it would be a waste of time and memory to scan them all and then filter only 50 of them but on the other side isn't making 50 requests to dynamodb could have performance issues and require more provisioning ?


Solution

  • You're right that a Scan operation, assuming you will only need to read 50 records out of a million, is the worst possible solution. It will be very slow, and will cost you a pretty penny because when you scan, you pay Amazon to read all your data - even if you filter most of it out.

    Making 50 separate GetItem requests isn't so bad - it's certainly better than a scan. You only pay Amazon for the actual retrieved item - you don't pay more because it's 50 separate requests. Of course, if you don't want huge latency, don't just start these requests one after another - start them all in parallel.

    But for this use-case, DynamoDB provides an even better operation BatchGetItem. With this operation you give DynamoDB the list of 50 required keys, in just one HTTP request, and it will fetch all of them (in parallel) and return all the responses to you. It seems to be that BatchGetItem is the best fit for your use case.