amazon-dynamodbdynamodb-queriesamazon-dynamodb-index

How to query multiple DynamoDB objects at the time


I have a table Tours with a Partition Key (PK) id, which is unique and a Global Secondary Index (GSI) geohash, which has repeated values.

I need to get all the geohashes in the array geohashNeighbors, and I am using the following query:

const geohashNeighbors = ['geo1', 'geo2', 'geo3']

const queryPromises = geohashNeighbors.map(geohash => 
    dynamodb.send(new QueryCommand({
        TableName: 'Tours',
        IndexName: 'geohash_index',
        KeyConditionExpression: 'geohash = :geohash',
        ExpressionAttributeValues: {
            ':geohash': geohash
        }
    }))
)

const results = await Promise.all(queryPromises)

As you can see, I am sending one request per item in the array, but I would like to send only one request and get all the items at once.

This is what I tried:

I come from the relational DB world and have only a few weeks of experience with DynamoDB. I searched for hours and tried a few ideas, but either DynamoDB is lacking basic operations or I am missing something (probably the second).


Solution

  • Fundamentally, this is not how DynamoDB works. DynamoDB's "tables" and "indexes" have almost nothing in common with SQL's "tables" and "indexes". It can take a while to "unlearn" the relational mindset.


    All data is stored in a partition, which can only be found by its Partition Key. There is no concept of "nearby" or "similar" partitions; you either know the Partition Key or not. (Imagine sorting your address book by the SHA-1 hash of people's names; there's no way to find "everyone named John", because the hashes will have nothing in common.)

    This is true in both Tables and Secondary Indexes - the "index" is essentially a second copy of the table, with exactly the same structure. (Secondary Indexes are allowed to have duplicate items, but that doesn't change how you access them.)

    If your Table or Index has only a Partition Key, there are exactly two ways of finding an item:

    1. Query it by knowing its Partition Key
    2. Scan all items, in no particular order, testing each one against a filter

    If your Table or Index has a Partition Key plus a Sort Key, there are exactly three ways of finding an item:

    1. Query it by knowing its Partition Key and Sort Key
    2. Scan all items, in no particular order, testing each one against a filter
    3. Query a single Partition, by knowing its Partition Key; then retrieve some or all items in that partition, in the order of their Sort Keys

    You can run any of those in a loop, either locally, or using batch operations in the DynamoDB API; but every read or write ultimately needs to identify items using one of those methods.


    AWS has quite a good section of the user manual about "Data modelling for DynamoDB". The key idea is that you design your Tables and Indexes around how you're going to use the data, not its abstract structure.

    So, if your use case involves finding "neighbours" according to some algorithm, you need those neighbours to be together in a single partition. That probably involves calculating a Partition Key which groups them in some way, and a Sort Key which identifies sub-sets within that group. That can be either the Partition Key + Sort Key of the main Table, or of a Secondary Index.

    I don't know much about geohashes, but a quick search suggests they're designed so that geographically close items are also lexically close, so you might be able to use:

    Once you have that, you can use a Query to:

    You might still need to Query two partitions because the values you want are "on the border"; but you won't need a Query for every item.


    For example, the data (in the table or GSI) could look like this:

    GeoHashPrefix (PK) Geohash (SK) Something Else
    AA AA123 blah blah
    AA AA128 rhubarb custard
    AA AA135 foo bar
    AA AA200 left right
    AB AB123 me you

    Then to find all geohashes starting with "AA" would just be a single Query with a KeyConditionExpression of:

    To find only those beginning with "AA1", you'd specify an extra condition on the Sort Key within that partition:

    Or to find all geohashes between "AA125" and "AA135" (inclusive):


    A final note is that DynamoDB is not intended to be a universal data store. If the data you have is not suited for partitioning, or the access you need cannot be optimised on that basis, you should pick a different technology.

    For instance, ElasticSearch / OpenSearch has optimised index for searching geospatial data based on various operators. There are also extensions for relational databases that add "GIS" functionality, such as PostGIS for PostgreSQL which is supported by AWS Aurora.