I am currently working on an online platform and one functionality is where people can register and search for other users.
For better usability I want to implement typeahead search in the GUI where you can search after other usernames. Everything is hosted on AWS and so the DB is a DynamoDB.
The table is called users and the partition key is called "usernames". The big problem here is, that I cannot use "begins_with" with the primary key as I always have to specify the key in the queries like this:
Key: { username: "mario12" }
for my typeahead functionality it is very unfortunate that this is not allowed:
Key: { begins_with(username, "mari") }
Eg if somebody starts typing "mari" if he is looking for the user with username "mario12". I would have to scan the whole table which is NOT recommended at all as it is very inefficient and quite costy on bigger tables.
MY SOLUTION:
So I had the idea of the following solution which I already implemented as a proof of concept: Creating a new typeahead lookup table with 2 elements:
starting_letters
as partition key: holds starting letters entered in the typeahead search barusernames
: List of usernames that start with the starting lettersSo if a new user signs up, eg user mario12
, I save it to the users table like before, but I also insert following values in the new typeahead lookup table:
starting_letters | usernames |
---|---|
mari | [mario12] |
mario | [mario12] |
mario1 | [mario12] |
mario12 | [mario12] |
and when another user signs up, eg user marionaut3
, the typeahead table looks like this:
starting_letters | usernames |
---|---|
mari | [mario12,marionaut3] |
mario | [mario12,marionaut3] |
mario1 | [mario12] |
mario12 | [mario12] |
marion | [marionaut3] |
mariona | [marionaut3] |
marionau | [marionaut3] |
marionaut | [marionaut3] |
marionaut3 | [marionaut3] |
and when a third user signs up, eg user marion5
, the typeahead table looks like this:
starting_letters | usernames |
---|---|
mari | [mario12,marionaut3,marion5] |
mario | [mario12,marionaut3,marion5] |
mario1 | [mario12] |
mario12 | [mario12] |
marion | [marionaut3,marion5] |
marion5 | [marion5] |
mariona | [marionaut3] |
marionau | [marionaut3] |
marionaut | [marionaut3] |
marionaut3 | [marionaut3] |
I think you get the idea....
So when somebody starts typing "mario" in the GUI, I can query the partition key "mario" of my typeahead lookup table and return the array [mario12,marionaut3,marion5]
.
When he continues to type "marion", the query returns the array [marionaut3,marion5]
as "mario12" doesn't match anymore.
In this case I don't need an expensive scan and can simply query the partition key.
I know - in the beginning there are many fresh inserts if there are new unique usernames, but my usernames and typeahead feature starts working after 4 characters and after many usernames have been inserted already, only new values for new username endings trigger new partition keys. Also my usernames can only contain characters a-zA-Z0-9_- and have a limited length, so yes, there are many (but still limited) variations that Dynamo should be able to handle.
Now my question: What do you think of my approach and are there any better solutions than this if you have DynamoDB and want to implement typeahead searches?
Your approach will work. However…
Let me offer an alternative which will involve less replication of username substrings. Have the partition key be the first four letters of the username. Have the sort key be each of the usernames that have those four letters.
starting_letters (PK) | usernames (SK) |
---|---|
mari | mario12 |
marionaut3 | |
marion5 |
Thus every username is stored just once and directly in this table under the item collection of its leading four letters. You can issue a query providing the right starting_letters prefix and provide a begins_with on the sort key to find usernames that start with all the given letters.