javaamazon-web-serviceskotlinamazon-dynamodb

Index Based Querying and Sorting - DynamoDb


I am coming from an Android Application Development background. I am exploring DynamoDB and I've written a Kotlin server and linked it with DynamoDB. While I have successfully made connection, table, Partition Keys and Sort Keys, I'm having some trouble with Global Secondary Index.

I'm trying to build a pagination API, API needs support following params:

query by

title, release date, genres, min rating, max rating, tags.

I also want to sort by,

release date, rating.

Since in my assets table imdb_id is set as Partition Key. And I'm using Sort Key for storing DETAILS (contains all essential info (non-key) such as title, release date, rating, description, etc.) and RECOMMENDATION. SK in this format because, I needed a way to make lists of recommended movies for each movie, I could have supplied a field with the json of array objects as String or String Array, but since I am storing the movies' detail info anyways (and I'd want them to be available in the database as queryable items), it made much more sense to have them as separate records. So, I stored the original movie as a DETAILS record, I stored each recommended movie as a separate RECOMMENDATION#<imdb_id> record. Recommendations only contained the imdb_id of the recommended movie (which can be used to fetch DETAILS of the recommendation). And then finally I stored each of the recommended movie as a DETAILS record.

This way, I was able to keep a record of all of the recommendations against all movies while also storing the DETAILS info of those recommended movies too. I hope that makes sens3! 🙃

So, I created a Global Secondary Index in terraform like so:

global_secondary_index {
  name               = "ReleaseDateIndex"
  hash_key           = "imdb_id"
  range_key          = "release_date"
  projection_type    = "INCLUDE"
  non_key_attributes = [
      "posters",
      "title",
      "release_date",
      "tmdb_rating",
      "mpaRating"
  ]
}

And when I query the results in my DynamoDB Enhanced Client like so:

    val index = table.index("ReleaseDateIndex")

    val queryCondition = QueryConditional.keyEqualTo(
        Key
            .builder()
            .partitionValue(
                type,
            )
            .build()
    )

    val queryBuilder = QueryEnhancedRequest.builder()
        .queryConditional(queryCondition)
        .scanIndexForward(orderBy == "asc")
        .limit(limit)

    // Execute query
    return index.query(
        queryBuilder.build()
    ).flatMap { page ->
        page.items()
    }

I get duplicate objects in my query result. I understand why, it is because all objects with different sort key for DETAILS and RECOMMENDATIONS SK all are returned. But I only want objects with SK = DETAILS.

I can apply a FilterExpression but that would filter results after loading them into memory. Which increases read-cost, utilizes more RAM and will slow down the overall time.

So, I am looking for way to query objects where SK = DETAILS.

I am open to schema changes or any other optimisations since this is for learning practice.

Examples of how the API would behave:

curl --location 'https://<api_url>/listing?imdb_id=tt0053219&direction=asc&query=movie%20title&sortBy=release_date&limit=50'

My current understanding is that I cannot apply filter based on both title and genre at the same time (might be possible by using a composite SK on the index, topic for another time).

So, for now genre API fetch would look something like:

curl --location 'https://api_url>/listing?imdb_id=tt0053219&direction=asc&genre=drama&sortBy=release_date&limit=50'

The imdb_id is purely intended as an exclusive start key and will be null for the first page.

Ideally sort By has these options

enum class SortBy(
    val value: String,
) {
    TITLE("title"),
    RATING("rating"),
    LIKES("likes"),
    RELEASE_DATE("release_date"),
    ;
}

Solution

  • You're talking about efficient querying on an index, for a query to be efficient it must be done in the KeyConditionExpression. DynamoDB only allows 2 keys, with a full equality necessary on the partition key.

    If you need to filter on multiple attributes, you must include them in the keys, using a composite key.

    Filter expression can be inefficient but it depends how many items you're filtering out. If for example you filter out only a few items then it's likely the performance and cost will remain the same. If you filter out the majority then you've become inefficient. You can monitor the efficiency in the response of each request:

    Count / ScannedCount * 100