javascriptsqlregexpostgresqlregexp-replace

Alternative to \p{Lo} Unicode Class in PostgreSQL Regex


I encountered a problem when working with regular expressions in PostgreSQL. It turns out that PostgreSQL has no support for the \p{Lo} class, which corresponds to the "Letter, Other" category in the Unicode standard. I need to use this class in my regular expression, but I can't find a way to do this in PostgreSQL.

I'm looking for an alternative approach or way to define this class through the available Unicode ranges in PostgreSQL. I would not want to manually list all possible characters as there are too many.

So far I've found a few Unicode ranges that partially fit:

[\u0370-\u03FF\u0400-\u04FF\u0590-\u05FF\u0600-\u06FF\u3040-\u30FF\u30FF\u4E00-\u9FFF].

However these ranges are not enough to fully cover \p{Lo}. Maybe there is a more comprehensive way to represent \p{Lo} in PostgreSQL or some known method that can help to achieve similar functionality?

Any help or links to specific Unicode ranges that could match \p{Lo} would be greatly appreciated.

Thanks in advance!


Solution

  • You could create a function in perl to solve that straightforwardly:

    create or replace function matchunicodeplo(char) 
    returns bool as $$
        if ($_[0] =~ /\p{Lo}/g){    
            return true;
        }
        else {
            return false;
        }   
    $$ language plperl;
    

    To check if perl language is installed in your database

    select lanname 
    from pg_language 
    where lanname = 'plperl';
    

    If not installed, then (maybe necessary to install the package postgresql-plperl-your_ver_number)

    create extension plperl;
    

    Then just run

    select matchunicodeplo('あ');
    
    matchunicodeplo
    true