sqlpostgresqlpattern-matchinglongest-prefix

Longest Prefix Match


What's the best way to get accurate and fast queries in PostgreSQL for a longest prefix match?

Is it:

A.) select * from table where column in (subselect) ;

B.) select * from table where strpos(column,column2) = 1
    order by length(column2) desc limit 1 ;

C.) select * from table where column ~ column2
    order by length(column2) desc limit 1

I'm planning to use in an update. Any ideas?


Solution

  • I wouldn't know of a function doing this out of the box in PostgreSQL.
    A recursive CTE would be the key element for a rather elegant solution (available in PostgreSQL 8.4 or later).

    I am assuming a table filter to hold the filter strings:

    CREATE TABLE filter (f_id int, string text);
    

    And a table tbl to be search for the longest match:

    CREATE TABLE tbl(t_id int, col text);
    

    Query

    WITH RECURSIVE
         f AS (SELECT f_id, string, length(string) AS flen FROM filter)
        ,t AS (SELECT t_id, col, length(col) AS tlen FROM tbl)
        ,x AS (
        SELECT t.t_id, f.f_id, t.col, f.string
              ,2 AS match, LEAST(flen, tlen) AS len
        FROM   t
        JOIN   f ON left(t.col, 1) = left(f.string, 1)
    
        UNION ALL
        SELECT t_id, f_id, col, string, match + 1, len
        FROM   x
        WHERE  left(col, match) = left(string, match)
        AND    match <= len
        )
    SELECT DISTINCT
           f_id
          ,string
          ,first_value(col) OVER w AS col
          ,first_value(t_id) OVER w AS t_id
          ,(first_value(match) OVER w -1) AS longest_match
    FROM   x
    WINDOW w AS (PARTITION BY f_id ORDER BY match DESC)
    ORDER  BY 2,1,3,4;
    

    Detailed explanation how the final SELECT works in this related answer.
    Working demo on sqlfiddle.

    You did not define which match to pick from a set of equally long matches. I am picking one arbitrary winner from ties.

    I'm planning to use in an update.

    PostgreSQL 9.1 introduced data modifying CTEs, so you can use this in an UPDATE statement directly.