javascriptamazon-dynamodbnosql

Simulating IN clause for NoSQL database


I want to simulate a SELECT * FROM TABLE WHERE ID IN {LIST_OF_IDS} query, I know Im using a NoSQL database (dynamoDB) but I have an index on the field for which I want to execute this query. I want to get all the posts from a user where the userID (index column) is the given list.

I am using Amplify as my managed backend and using graphql as my API. I know there is no IN keyword to use for graphql (at least nog APPSYNC). I have tried creating a custom resolver from a template provided by AWS like the following

import { util } from '@aws-appsync/utils';

/**
 * Gets items from the DynamoDB tables in batches with the provided `id` keys
 * @param {import('@aws-appsync/utils').Context} ctx the context
 * @returns {import('@aws-appsync/utils').DynamoDBBatchGetItemRequest} the request
 */
export function request(ctx) {
    return {
        operation: 'BatchGetItem',
        tables: {
            Posts: {
                keys: ctx.args.ids.map((id) => util.dynamodb.toMapValues({ id })),
                consistentRead: true,
            },
        },
    };
}

/**
 * Returns the BatchGetItem table items
 * @param {import('@aws-appsync/utils').Context} ctx the context
 * @returns {[*]} the items
 */
export function response(ctx) {
    if (ctx.error) {
        util.error(ctx.error.message, ctx.error.type);
    }
    return ctx.result.data.Posts;
}

But this doesnt seem to work. I would also like to avoid to run a query for each user in the list to retrieve the posts. All help is greatly appreciated, also if you have another architectural solution/proposal to solve this issue I am all ears!


Solution

  • The BatchGetItem request that you tried to use can be used to get N specific items in one request. The problem is is that in your case, you don't want to get N specific items - you want to do N queries in an index, and each of these queries can return more than one - potentially many - items. BatchGetItem can't do that.

    To query an index for one value and get the potentially many results from this query, you must use the Query API request - there is no other alternative. To do N of those queries, you unfortunately need to send N individual Query requests. If you send all of those requests in parallel, on N separate connections, this will be just as good as a BatchGetItem in all respects:

    1. BatchGetItem of N items isn't cheaper than N separate GetItem or Query requests, because you pay for each item individually in both cases.
    2. BatchGetItem of N items doesn't have lower latency than N separate GetItem or Query, if you do all these queries in parallel. In fact, because DynamoDB uses HTTP 1.1 with head-of-line blocking problems, doing N requests in parallel can be even faster.

    To make a long story short, I recommend that to simulate IN on an indexed column, you can spawn multiple connections and send multiple Query requests in parallel, one request to each of the searched values.