In this DynamoDB query/scan best practices guide, I'm reading you can help avoid sudden spikes in read activity by reducing the page size of your query/scan (from the default max 1MB).
I have a few instances where I need to scan the entire table, and I'm using paginateScan
to loop through and get all the items—something like this:
async function fetchAll() {
const pagedScan = paginateScan({
client: ddbDocClient,
pageSize: 100
}, {
TableName: 'tickets'
});
const results = [];
for await (const page of pagedScan) {
results.push(page.Items)
}
return results;
}
But not understanding if there are any tradeoffs in having the pageSize
set lower or kept at the max 1MB.
Does setting a lower page size just make the query/scan take longer, so the read is spread out over a longer period of time? For instance, if I need to scan an entire table and don't care if it takes a while, should I set the page size to 1 item?
Does it matter at all if I'm ultimately just looping through to get all the items?
Page size has both an impact on latency and performance.
A lower page size will obviously have to make more round trips to DynamoDB, which increases the latency of the Scan.
Having a lower page size will actually increase the cost somewhat, as each page is rounded up to the nearest 4KB. So a Scan with a page size of 1, would cost you 1 RCU for each item, whereas in reality you could probably fit 10's of items into a 4KB limit.
Reducing the page size can be helpful if it's a background process and you don't want to consume all the capacity from the table impacting it's normal traffic. A lower page size will spread the reads out over a longer time and reduce how much capacity is consumed on a per second basis. This could also save costs if using provisioned mode, but it's a trade off between that and my aforementioned point on rounding up to 4KB.