amazon-web-servicesamazon-dynamodbdynamodb-queriesamazon-dynamodb-indexsecondary-indexes

How can i use Parallel Scan of AWS DynamoDB to run scan on a specific GSI?


I have a DynamoDB table where each Item has a key of the name 'DataType'. Also there is a GSI on this table with this 'DataType' as the HashKey and 'timestamp' as rangeKey.

Around 10 per cent of the table items have the 'DataType' value as 'A'.

I want to scan all the items of this GSI with HashKey fixed as 'A'. Is there any way to perform this using scan/parallel scan? Or do i need to use query on GSI itself to perform this operation?

As per the documentation, https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/document/spec/ScanSpec.html

i could not find any way to specify a GSI on which i can scan with HashKey fixed.


Solution

  • Given that you want to only look at items with the Hash key "A", you'll need to use the Query API rather than the Scan API, provide the index name, and query for items in the index that have that partition key.

    Here's some sample code using the Node AWS SDK V3 that creates three items in a table with a Global Secondary Index (called GSI1). Two of the items have a GSI1PK value of "orange", while the other has a GSI1PK value of "gold". The query returns the two matches:

        const tableName = getFromEnv(`TABLE_NAME`);
        const client = new DynamoDBClient({});
    
        async function createItem(
            name: string,
            PK: string,
            SK: string,
            GSI1PK: string,
            GSI1SK: string,
        ): Promise<void> {
            const item = { PK, SK, GSI1PK, GSI1SK, name };
            const putCommand = new PutItemCommand({
                TableName: tableName,
                Item: marshall(item)
            });
            await client.send(putCommand);
            log(`Created item: ${name} with GSI1PK ${GSI1PK}`);
        }
    
        await createItem(`foo`, `fooPK`, `fooSK`, `orange`, `blue`);
        await createItem(`bar`, `barPK`, `barSK`, `orange`, `white`);
        await createItem(`baz`, `bazPK`, `bazSK`, `gold`, `garnet`);
    
        log(`Waiting 5 seconds, as GSIs don't support consistent reads`)
        await wait(5);
    
        const query: QueryCommandInput = {
            TableName: tableName,
            IndexName: `GSI1`,
            KeyConditionExpression: `#pk = :pk`,
            ExpressionAttributeNames: {
                '#pk': `GSI1PK`,
            },
            ExpressionAttributeValues: {
                ':pk': { S: `orange` },
            },
        }
    
        const result = await client.send(new QueryCommand(query));
        log(`Querying GSI1 for "orange"`);
        result.Items.forEach((entry) => {
            log(`Received: `, unmarshall(entry).name);
        });
    

    This produces the output of:

    Created item: foo with GSI1PK orange
    Created item: bar with GSI1PK orange
    Created item: baz with GSI1PK gold
    Waiting 5 seconds, as GSIs don't support consistent reads
    Querying GSI1 for "orange"
    Received:  foo
    Received:  bar
    

    One thing worth noting from this example is that GSIs don't allow consistent reads. So if your use case requires immediate consistency, you'll need to find another solution.