amazon-dynamodbamazon-dynamodb-indexaws-java-sdk-2.xaws-java-sdk-dynamodbdynamodb-enhanced-client

What's the cost effective way to retrieve batch DDB Items by LSI sort keys?


I have a DDB table with the following structure:

CustmerID (PK) emailId (SortKey) orderId (LSI)

I want to get batch items using AWS DynamoDB Java SDK for the following 2 patterns:

  1. Get Items by a PK and batch of 50 emailId (SK)
  2. Get Items by a PK and batch of 50 order ID (LSI SK)

The first access pattern I was able to implement using

ReadBatch.Builder<Customer> customerBuilder = ReadBatch.builder(Customer.class)
      .mappedTableResource(customerTable);

emails.forEach(email -> {
      Customer key = new Customer();
      key.setCustomerId(custId);
      key.setEmail(email);
      customerBuilder.addGetItem(key);
    });

BatchGetItemEnhancedRequest batchGetItemEnhancedRequest =
    BatchGetItemEnhancedRequest.builder()
      .addReadBatch(customerBuilder.build())
      .build();

When I am trying to perform the same operation using the LSI sort key i.e. orderId, the request fails with a 400 error.

slight change that I made for orderId, I couldn't find any API to set the LSI index name though.

orderIds.forEach(orderId -> {
      Customer key = new Customer();
      key.setCustomerId(custId);
      key.setOrderId(orderId);
      customerBuilder.addGetItem(key);
    });

BatchGetItemEnhancedRequest batchGetItemEnhancedRequest =
    BatchGetItemEnhancedRequest.builder()
     
 .addReadBatch(customerBuilder.build())
      .build();

Customer table bean

@DynamoDbBean
@Getter
@Setter
public class Customer {
  private String customerId;
  private String emailId;
  private String orderId;

  @DynamoDbPartitionKey
  @DynamoDbAttribute("customerId")
  public String getCustomerId() {
    return customerId;
  }
  
  @DynamoDbSortKey
  @DynamoDbAttribute("emailId")
  public String getEmailId() {
    return emailId;
  }

  @DynamoDbSecondarySortKey(indexNames = {"orderId-index"})
  public String getOrderId() {
    return orderId;
  }
}

There are 2 approaches I could think of:

use PartiQL

SELECT * FROM "tableName"."lsiIndexName" WHERE "customerId" = ? and "orderId" in (?, ?, ?);  // batch of 50

use filter expression by performing Query on primary index instead of LSI

String keyConditionExpression = "#customerId" + " = :partitionKeyVal";

String filterExpression = "#orderId" + " IN (:value1, :value2, :value, :value)";

// can't query against LSI unfortunately since the SDK doesn't allow filter expressions with multiple sort keys
QueryRequest queryRequest = QueryRequest.builder()
      .tableName("customer")
      .keyConditionExpression(keyConditionExpression)
      .filterExpression(filterExpression)
      .expressionAttributeNames(expressionAttributeNames)
      .expressionAttributeValues(expressionAttributeValues)
      .build();

Cost and performance wise what's the appropriate way to perform batch retrieval by LSI SKs considering that I could have more than 20k items per partition?


Solution

  • Firstly, note that BatchGetItem just counts as multiple GetItem, so that there is no cost savings here. The only savings is that you do it all in one HTTP call, so latency will be better than firing individual requests.

    So, cost-wise, rather than the option you proposed where you get the entire partition and filter, it is actually more cost-efficient to loop through all the PK (Partition key and LSI combination) and fire multiple GetItem. Unfortunately, there is no BatchGetItem for LSI.

    In any case, the inability to perform BatchGetItem using LSI makes it clear that LSI is meant for single-item retrieval. If you really need batch retrievals, such as when firing individual requests results in too much HTTP overhead, separate into a second table instead. That is, create another table having the orders-related information with the PK being CustomerId and OrderId. Surely, if the original table PK is CustomerId and Email, OrderId cannot be an LSI? I mean just how many "emails" can an "order" have!