sqlruby-on-railspostgresqllevenshtein-distancefuzzy-search

How to create simple fuzzy search with PostgreSQL only?


I have a little problem with search functionality on my RoR based site. I have many Produts with some CODEs. This code can be any string like "AB-123-lHdfj". Now I use ILIKE operator to find products:

Product.where("code ILIKE ?", "%" + params[:search] + "%")

It works fine, but it can't find product with codes like "AB123-lHdfj", or "AB123lHdfj".

What should I do for this? May be Postgres has some string normalization function, or some other methods to help me?


Solution

  • Postgres provides a module with several string comparsion functions such as soundex and metaphone. But you will want to use the levenshtein edit distance function.

    Example:
    
    test=# SELECT levenshtein('GUMBO', 'GAMBOL');
     levenshtein
    -------------
               2
    (1 row)
    

    The 2 is the edit distance between the two words. When you apply this against a number of words and sort by the edit distance result you will have the type of fuzzy matches that you're looking for.

    Try this query sample: (with your own object names and data of course)

    SELECT * 
    FROM some_table
    WHERE levenshtein(code, 'AB123-lHdfj') <= 3
    ORDER BY levenshtein(code, 'AB123-lHdfj')
    LIMIT 10
    

    This query says:

    Give me the top 10 results of all data from some_table where the edit distance between the code value and the input 'AB123-lHdfj' is less than 3. You will get back all rows where the value of code is within 3 characters difference to 'AB123-lHdfj'...

    Note: if you get an error like:

    function levenshtein(character varying, unknown) does not exist
    

    Install the fuzzystrmatch extension using:

    test=# CREATE EXTENSION fuzzystrmatch;