amazon-dynamodb amazon-dynamodb-index aws-sdk-go-v2

DynamoDB Global Secondary Index "Batch" Retrieval

I've see older posts around this but hoping to bring this topic up again. I have a table in DynamoDB that has a UUID for the primary key and I created a secondary global index (SGI) for a more business-friendly key. For example:

| account_id  | email           | first_name | last_name |
|------------ |---------------- |----------- |---------- |
| 4f9cb231... | linda@gmail.com | Linda      | James     |
| a0302e59... | bruce@gmail.com | Bruce      | Thomas    |
| 3e0c1dde... | harry@gmail.com | Harry      | Styles    |

If account_id is my primary key and email is my SGI, how do I query the table to get accounts with email in ('linda@gmail.com', 'harry@gmail.com')? I looked at the IN conditional expression but it doesn't appear to work with SGI. I'm using the go SDK v2 library but will take any guidance. Thanks.

Solution

Short answer, you can't.

DDB is designed to return a single item, via GetItem(), or a set of related items, via Query(). Related meaning that you're using a composite primary key (hash key & sort key) and the related items all have the same hash key (aka partition key).

Another way to think of it, you can't Query() a DDB Table/index. You can only Query() a specific partition in a table or index.

Scan() is the only operation that works across partitions in one shot. But scanning is very inefficient and costly since it reads the entire table every time.

You'll need to issue a GetItem() for every email you want returned.

Luckily, DDB now offers BatchGetItem() with will allow you to send multiple, up to 100, GetItem() requests in a single call. Saves a little bit of network time and automatically runs the requests in parallel; but otherwise is the little different from what your application could do itself directly with GetItem(). Make no mistake, BatchGetItem() is making individual GetItem() requests behind the scenes. In fact, the requests in a BatchGetItem() don't even have to be against the same tables/indexes. The cost for each request in a batch will be the same as if you'd used GetItem() directly.

One difference to make note of, BatchGetItem() can only return 16MB of data. So if your DDB items are large, you may not get as many returned as your requested.

For example, if you ask to retrieve 100 items, but each individual item is 300 KB in size, the system returns 52 items (so as not to exceed the 16 MB limit). It also returns an appropriate UnprocessedKeys value so you can get the next page of results. If desired, your application can include its own logic to assemble the pages of results into one dataset.