I encountered a problem when working with regular expressions in PostgreSQL. It turns out that PostgreSQL has no support for the \p{Lo}
class, which corresponds to the "Letter, Other" category in the Unicode standard. I need to use this class in my regular expression, but I can't find a way to do this in PostgreSQL.
I'm looking for an alternative approach or way to define this class through the available Unicode ranges in PostgreSQL. I would not want to manually list all possible characters as there are too many.
So far I've found a few Unicode ranges that partially fit:
[\u0370-\u03FF\u0400-\u04FF\u0590-\u05FF\u0600-\u06FF\u3040-\u30FF\u30FF\u4E00-\u9FFF].
However these ranges are not enough to fully cover \p{Lo}
. Maybe there is a more comprehensive way to represent \p{Lo}
in PostgreSQL or some known method that can help to achieve similar functionality?
Any help or links to specific Unicode ranges that could match \p{Lo}
would be greatly appreciated.
Thanks in advance!
You could create a function in perl to solve that straightforwardly:
create or replace function matchunicodeplo(char)
returns bool as $$
if ($_[0] =~ /\p{Lo}/g){
return true;
}
else {
return false;
}
$$ language plperl;
To check if perl language is installed in your database
select lanname
from pg_language
where lanname = 'plperl';
If not installed, then (maybe necessary to install the package postgresql-plperl-your_ver_number)
create extension plperl;
Then just run
select matchunicodeplo('あ');
matchunicodeplo |
---|
true |