I'm writing a simple logging service in DynamoDB.
I have a logs table that is keyed by a user_id
hash and a timestamp
(Unix epoch int) range.
When a user of the service terminates their account, I need to delete all items in the table, regardless of the range value.
What is the recommended way of doing this sort of operation (Keeping in mind there could be millions of items to delete)?
My options, as far as I can see are:
A: Perform a Scan
operation, calling delete on each returned item, until no items are left
B: Perform a BatchGet
operation, again calling delete on each item until none are left
Both of these look terrible to me as they will take a long time.
What I ideally want to do is call LogTable.DeleteItem(user_id)
- Without supplying the range, and have it delete everything for me.
What I ideally want to do is call
LogTable.DeleteItem(user_id)
- Without supplying the range, and have it delete everything for me.
An understandable request indeed; I can imagine advanced operations like these might get added over time by the AWS team (they have a history of starting with a limited feature set first and evaluate extensions based on customer feedback), but here is what you should do to avoid the cost of a full scan at least:
Query
rather than Scan
to retrieve all items for user_id
- this works regardless of the combined hash/range primary key in use, because HashKeyValue
and RangeKeyCondition
are separate parameters in this API and the former only targets the Attribute value of the hash component of the composite primary key..ExclusiveStartKey
parameter:Primary key of the item from which to continue an earlier query. An earlier query might provide this value as the
LastEvaluatedKey
if that query operation was interrupted before completing the query; either because of the result set size or the Limit parameter. TheLastEvaluatedKey
can be passed back in a new query request to continue the operation from that point.
DeleteItem
as usualBatchWriteItem
is more appropriate for a use case like this (see below for details).As highlighted by ivant, the BatchWriteItem
operation enables you to put or delete several items across multiple tables in a single API call [emphasis mine]:
To upload one item, you can use the
PutItem
API and to delete one item, you can use theDeleteItem
API. However, when you want to upload or delete large amounts of data, such as uploading large amounts of data from Amazon Elastic MapReduce (EMR) or migrate data from another database in to Amazon DynamoDB, this API offers an efficient alternative.
Please note that this still has some relevant limitations, most notably:
Maximum operations in a single request — You can specify a total of up to 25 put or delete operations; however, the total request size cannot exceed 1 MB (the HTTP payload).
Not an atomic operation — Individual operations specified in a BatchWriteItem
are atomic; however BatchWriteItem
as a whole is a "best-effort" operation and not an atomic operation. That is, in a BatchWriteItem
request, some operations might succeed and others might fail. [...]
Nevertheless this obviously offers a potentially significant gain for use cases like the one at hand.