amazon-dynamodbdynamodb-queries

aws dynamodb ScanCommand table column with special characters


I have a dynamodb table that contains a column/attributes with special letter characters, e.g. é, ë and so on.

Is there a way to find the entries with just the "normal" letters (e.g. e).

scancommand looks like this

input = {
          "TableName": tableName,
          "FilterExpression": "contains(#name, :term)",
          "ExpressionAttributeNames": {
            "#name":"name",
          },
          "ExpressionAttributeValues": {
              ':term': event.pathParameters.term,
          }
        };

and should for example find the name "Zoe", even when the table contains "Zoë".


Solution

  • No, DynamoDB has a fairly limited selection of functions in FilterExpression, and one of them - contains() only supports exact substring matches.

    One commonly used solution is to denormalize the data, add another copy of the string which is more suitable for the search you need to do: In this case in addition to the "name" attribute you add to each item, add a new attribute, say "name_canonical" where if name was Zoë, name_canonical will be set to zoe. You can then do a filter on the name_canonical attribute. Obviously, each time you write an item or modify the name in it, you'll also need to update name_canonical accordingly.

    Finally I want to comment that what you are asking about doing is very inefficient: You are basically scanning the entire database, and doing string matching on everything. You'll be paying Amazon the cost of reading your entire database. A much more efficient solution would be a have the name_canonical as a key of the table (either of the original table or a GSI), and then instead of FilterExpression you'd use KeyConditionExpression - which will find just the data with the requested name, immediately, without paying to read the entire database.

    If you'll use KeyConditionExpression like I recommend, you'll see you won't be able to use contains() any more. You'll need to take the denormalization yet another step: For example, if name is Zoë Smith, you can create multiple name_canonical entries, one under zoe, one under smith, so that searching for the key zoe will find you the entries you want.

    Search engines do what I just described - splitting a string into separate words in their canonical forms - a process often called "lemmatization" - as part of their usual interface. But DynamoDB is not a search engine, so you need to do this on your own, or, like Leeroy said, integrate with yet another AWS product, a search engine (they recently announced a way to do this conveniently - see https://aws.amazon.com/blogs/aws/amazon-dynamodb-zero-etl-integration-with-amazon-opensearch-service-is-now-generally-available/).