databasesearchfull-text-searchquery-optimizationtext-search

Databases: Effectively implement string contains query


I need a way to effectively do a string contains query like:

# In SQL
LIKE '%some-string%'

# In mongo
{ $regex: /some-string/ }

But its very slow when the dataset size is big. Eg. I tried in a dummy DB (with and without an index - no index is surprisingly faster on mongo) and generate 100m rows (in reality theres more). Seems reasonable if I use ElasticSearch, but I am wondering if theres a DB or way I can structure my data to optimise this use case? I asked and I really need contains instead of a prefix match ...


Solution

  • Postgresql offers so-called trigram indexes. Those indexes can accelerate SQL col LIKE '%search%' predicates efficiently enough. Notice that indexing can, in all makes of server, speed up col LIKE 'string%' (without the leading wildcard character).

    MySQL / Mariadb have FULLTEXT indexes that work with a distinctive SQL syntax. That feature works word-by-word unlike, well, LIKE which works character-by-character. Microsoft SQL Server has a similar feature with different syntax. It also works word-by-word.

    So, there's no SQL standard way to do this efficiently, and different makes of server do it differently.

    If you haven't yet chosen a particular make of server, you should figure out whether one of the full text schemes will serve your purpose. If you must get good performance from LIKE, postgresql's trigram indexing is the way to go.