amazon-dynamodbboto3dynamodb-queriesamazon-dynamodb-index

return a constant number of rows from a dynamodb query using a filter expression and a row limit


I have a dynamodb table that looks like that

resource = boto3.resource("dynamodb")
resource.create_table(TableName="mytable", KeySchema=[{"AttributeName": "pk", "KeyType": "HASH"},
                                              {"AttributeName": "sk", "KeyType": "RANGE"}],
                                                  AttributeDefinitions=[
                                                      {"AttributeName": "pk", "AttributeType": "S"},
                                                      {"AttributeName": "sk", "AttributeType": "S"},
                                                      {"AttributeName": "type", "AttributeType": "S"},
                                                      {"AttributeName": "brand", "AttributeType": "S"},
                                                      {"AttributeName": "creationDate", "AttributeType": "S"},
                                                  ],
                                                  ProvisionedThroughput={
                                                      "ReadCapacityUnits": 10,
                                                      "WriteCapacityUnits": 10,
                                                  },
                                                  GlobalSecondaryIndexes=[
                                                      {
                                                          "IndexName": "GSI1",
                                                          "KeySchema": [
                                                              {"AttributeName": "type", "KeyType": "HASH"},
                                                              {"AttributeName": "brand", "KeyType": "RANGE"},
                                                          ],
                                                          "Projection": {
                                                              "ProjectionType": "ALL"
                                                          },
                                                          "ProvisionedThroughput": {
                                                              "ReadCapacityUnits": 5,
                                                              "WriteCapacityUnits": 5,
                                                          },
                                                      },
                                                      {
                                                          "IndexName": "GSI2",
                                                          "KeySchema": [
                                                              {"AttributeName": "type", "KeyType": "HASH"},
                                                              {"AttributeName": "creationDate", "KeyType": "RANGE"},
                                                          ],
                                                          "Projection": {
                                                              "ProjectionType": "ALL"
                                                          },
                                                          "ProvisionedThroughput": {
                                                              "ReadCapacityUnits": 5,
                                                              "WriteCapacityUnits": 5,
                                                          },
                                                      }
                                                  ]
                                                  )

I would like to create a query that fetches all entries of type - lets say type "ABC". The query however should also only allow for certain brands. Lets say there are three

and that user has only access to two of them ["Addidas", "Nike"]. The query should be paginated with a constant page size (lets say 10 entries) and return LastEvaluatedKey. The returned data needs to be sorted by a field called creationDate.

To solve that I have created the following query:

response = resource.query(IndexName="GSI2", KeyConditions={
                                                  "type": {
                                                      "AttributeValueList": ["ABC"],
                                                      "ComparisonOperator": "EQ"
                                                  }
                                              },
                                              FilterExpression=Attr("brand").is_in(brands),
                                              Limit=no_of_items,
                                              ScanIndexForward=False
                                              )

It works as desired, as long as all brands are in the array. If they are not it will almost always yield less then ten entries since the limit is applied prior to the FilterExpression. Is there any way to model this to make pagination work in cases where not all brands are requested?


Solution

  • There is unfortunately no way to limit based on the count of results after filtering. You need to make an estimate of your filter yield (percent of likely matches) to calculate a pre-filter limit to use. Maybe estimate a bit high. If you don’t get enough, do a second fetch.