I use PostgreSQL 10 and I run CREATE EXTENSION unaccent;
succesfully. I have a plgsql function that contains the following
whereText := 'lower(unaccent(place.name)) LIKE lower(unaccent($1))';
later, according to what user chose, more clauses may be added to the whereText
.
The whereText
is finally used in the query:
placewithkeys := '%'||placename||'%';
RETURN QUERY EXECUTE format('SELECT id, name FROM '||fromText||' WHERE '||whereText)
USING placewithkeys , event, date;
The whereText := 'LOWER(unaccent(place.name)) LIKE LOWER(unaccent($1))';
does not work, even if I remove the LOWER
part.
I do select __my_function('Τζι');
and I get nothing back, even though I should get back results, because in the database there is the name Τζίμα
If I remove the unaccent
and leave the LOWER
it works, but not for accents : τζ
brings Τζίμα
back as it should. It seems like the unaccent
is causing a problem.
What am I missing? How can I fix this?
Since there were comments about the syntax and possible SQLi , I provide the whole function definition, now changed to work accent-insensitive and case-insensitive in Greek:
CREATE FUNCTION __a_search_place
(placename text, eventtype integer, eventdate integer, eventcentury integer, constructiondate integer, constructioncentury integer, arstyle integer, artype integer)
RETURNS TABLE
(place_id bigint, place_name text, place_geom geometry)
AS $$
DECLARE
selectText text;
fromText text;
whereText text;
usingText text;
placewithkeys text;
BEGIN
fromText := '
place
JOIN cep ON place.id = cep.place_id
JOIN event ON cep.event_id = event.id
';
whereText := 'unaccent(place.name) iLIKE unaccent($1)';
placewithkeys := '%'||placename||'%';
IF constructiondate IS NOT NULL OR constructioncentury IS NOT NULL OR arstyle IS NOT NULL OR artype IS NOT NULL THEN
fromText := fromText || '
JOIN construction ON cep.construction_id = construction.id
JOIN construction_atype ON construction.id = construction_atype.construction_id
JOIN construction_astyle ON construction.id = construction_astyle.construction_id
JOIN atype ON atype.id = construction_atype.atype_id
JOIN astyle ON astyle.id = construction_astyle.astyle_id
';
END IF;
IF eventtype IS NOT NULL THEN
whereText := whereText || 'AND event.type = $2 ';
END IF;
IF eventdate IS NOT NULL THEN
whereText := whereText || 'AND event.date = $3 ';
END IF;
IF eventcentury IS NOT NULL THEN
whereText := whereText || 'AND event.century = $4 ';
END IF;
IF constructiondate IS NOT NULL THEN
whereText := whereText || 'AND construction.date = $5 ';
END IF;
IF constructioncentury IS NOT NULL THEN
whereText := whereText || 'AND construction.century = $6 ';
END IF;
IF arstyle IS NOT NULL THEN
whereText := whereText || 'AND astyle.id = $7 ';
END IF;
IF artype IS NOT NULL THEN
whereText := whereText || 'AND atype.id = $8 ';
END IF;
whereText := whereText || '
GROUP BY place.id, place.geom, place.name
';
RETURN QUERY EXECUTE format('SELECT place.id, place.name, place.geom FROM '||fromText||' WHERE '||whereText)
USING placewithkeys, eventtype, eventdate, eventcentury, constructiondate, constructioncentury, arstyle, artype ;
END;
$$
LANGUAGE plpgsql;
unaccent()
now works for Greek letters, too. Diacritic signs are removed:
db<>fiddle here
Allow unaccent to remove accents from Greek characters (Tasos Maschalidis)
unaccent()
does not yet work for Greek letters. The call:
SELECT unaccent('
ἀ ἁ ἂ ἃ ἄ ἅ ἆ ἇ Ἀ Ἁ Ἂ Ἃ Ἄ Ἅ Ἆ Ἇ
ἐ ἑ ἒ ἓ ἔ ἕ Ἐ Ἑ Ἒ Ἓ Ἔ Ἕ
ἠ ἡ ἢ ἣ ἤ ἥ ἦ ἧ Ἠ Ἡ Ἢ Ἣ Ἤ Ἥ Ἦ Ἧ
ἰ ἱ ἲ ἳ ἴ ἵ ἶ ἷ Ἰ Ἱ Ἲ Ἳ Ἴ Ἵ Ἶ Ἷ
ὀ ὁ ὂ ὃ ὄ ὅ Ὀ Ὁ Ὂ Ὃ Ὄ Ὅ
ὐ ὑ ὒ ὓ ὔ ὕ ὖ ὗ Ὑ Ὓ Ὕ Ὗ
ὠ ὡ ὢ ὣ ὤ ὥ ὦ ὧ Ὠ Ὡ Ὢ Ὣ Ὤ Ὥ Ὦ Ὧ
ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ
ᾀ ᾁ ᾂ ᾃ ᾄ ᾅ ᾆ ᾇ ᾈ ᾉ ᾊ ᾋ ᾌ ᾍ ᾎ ᾏ
ᾐ ᾑ ᾒ ᾓ ᾔ ᾕ ᾖ ᾗ ᾘ ᾙ ᾚ ᾛ ᾜ ᾝ ᾞ ᾟ
ᾠ ᾡ ᾢ ᾣ ᾤ ᾥ ᾦ ᾧ ᾨ ᾩ ᾪ ᾫ ᾬ ᾭ ᾮ ᾯ
ᾰ ᾱ ᾲ ᾳ ᾴ ᾶ ᾷ Ᾰ Ᾱ Ὰ Ά ᾼ ᾽ ι ᾿
῀ ῁ ῂ ῃ ῄ ῆ ῇ Ὲ Έ Ὴ Ή ῌ ῍ ῎ ῏
ῐ ῑ ῒ ΐ ῖ ῗ Ῐ Ῑ Ὶ Ί ῝ ῞ ῟
ῠ ῡ ῢ ΰ ῤ ῥ ῦ ῧ Ῠ Ῡ Ὺ Ύ Ῥ ῭ ΅ `
ῲ ῳ ῴ ῶ ῷ Ὸ Ό Ὼ Ώ ῼ ´ ῾ ');
... returns all letters unchanged, no diacritic signs removed as we would expect.
(I extracted this list from the Wikipedia page on Greek diacritics.)
db<>fiddle here
Looks like a shortcoming of the unaccent module. You can extend the default unaccent
dictionary or create your own. There are instructions in the manual. I created several dictionaries in the past and it's simple. And you are not to first to need this:
Postgres unaccent rules for greek characters:
Unaccent rules plus greek characters for Postgres 9.6:
You need write access to the file system of the server, though - the directory containing the unaccent files. So, not possible on most cloud services ...
Or you might report a bug and ask to include Greek diacritic signs.
The code fragments you presented are not vulnerable to SQL injection. $1
is concatenated as literal string and only resolved in the EXECUTE
command later, where the value is safely passed with the USING
clause. So, no unsafe concatenation there. I would do it like this, though:
RETURN QUERY EXECUTE format(
$q$
SELECT id, name
FROM place ...
WHERE lower(unaccent(place.name)) LIKE '%' || lower(unaccent($1)) || '%'
$q$
)
USING placename, event, date;
Notes:
Less confusing - your original even confused Pavel in the comments, a professional in the field.
Assignments in plpgsql are slightly expensive (more so than in other PL), so adopt a coding style with few assignments.
Concatenate the two %
symbols for LIKE
into the main query directly, giving the query planner the information that the pattern is not anchored to start or end, which may help a more efficient plan. Only the user input is (safely) passed as variable.
Since your WHERE
clause references table place
, The FROM
clause needs to include this table anyway. So you cannot concatenate the FROM clause independently to begin with. Probably better to keep it all in a single format()
.
Use dollar-quoting so you don't have to escape single quotes additionally.
Maybe just use ILIKE
instead of lower(...) LIKE lower(...)
. If you work with trigram indexes (like would seem best for this query): those work with ILIKE
as well:
I assume you are aware that you may need to escape characters with special meanings in LIKE
pattern?
After you provided your complete function ...
CREATE OR REPLACE FUNCTION __a_search_place(
placename text
, eventtype int = NULL
, eventdate int = NULL
, eventcentury int = NULL
, constructiondate int = NULL
, constructioncentury int = NULL
, arstyle int = NULL
, artype int = NULL)
RETURNS TABLE(place_id bigint, place_name text, place_geom geometry) AS
$func$
BEGIN
-- RAISE NOTICE '%', concat_ws(E'\n' -- to debug
RETURN QUERY EXECUTE concat_ws(E'\n'
,'SELECT p.id, p.name, p.geom
FROM place p
WHERE unaccent(p.name) ILIKE (''%'' || unaccent($1) || ''%'')' -- no $-quotes
-- any input besides placename ($1)
, CASE WHEN NOT ($2,$3,$4,$5,$6,$7,$8) IS NULL THEN
'AND EXISTS (
SELECT
FROM cep
JOIN event e ON e.id = cep.event_id' END
-- constructiondate, constructioncentury, arstyle, artype
, CASE WHEN NOT ($5,$6,$7,$8) IS NULL THEN
'JOIN construction con ON cep.construction_id = con.id
JOIN construction_atype ON con.id = construction_atype.construction_id
JOIN construction_astyle ON con.id = construction_astyle.construction_id' END
-- arstyle, artype
, CASE WHEN NOT ($7,$8) IS NULL THEN
'JOIN atype ON atype.id = construction_atype.atype_id
JOIN astyle ON astyle.id = construction_astyle.astyle_id' END
, CASE WHEN NOT ($2,$3,$4,$5,$6,$7,$8) IS NULL THEN
'WHERE cep.place_id = p.id' END
, CASE WHEN eventtype IS NOT NULL THEN 'AND e.type = $2' END
, CASE WHEN eventdate IS NOT NULL THEN 'AND e.date = $3' END
, CASE WHEN eventcentury IS NOT NULL THEN 'AND e.century = $4' END
, CASE WHEN constructiondate IS NOT NULL THEN 'AND con.date = $5' END
, CASE WHEN constructioncentury IS NOT NULL THEN 'AND con.century = $6' END
, CASE WHEN arstyle IS NOT NULL THEN 'AND astyle.id = $7' END
, CASE WHEN artype IS NOT NULL THEN 'AND atype.id = $8' END
, CASE WHEN NOT ($2,$3,$4,$5,$6,$7,$8) IS NULL THEN
')' END
);
USING placename
, eventtype
, eventdate
, eventcentury
, constructiondate
, constructioncentury
, arstyle
, artype;
END
$func$ LANGUAGE plpgsql;
This is a complete rewrite with several improvements. Should make the function considerably. Also SQLi-safe (like your original). Should be functionally identical except the cases where I join fewer tables, which might not filter rows that are filtered by joining to the tables alone.
Major features:
Use EXISTS()
instead of lots of joins in the outer level plus GROUP BY
. This contributes the lion share to the better performance. Related:
format()
is typically a good choice to concatenate SQL from user input. But since you encapsulated all code elements and only pass flags, you don't need it in this case. Instead, concat_ws()
is of help. Related:
Only concatenate JOINs you actually need.
Fewer assignments, shorter code.
Default values for parameters. Allows simplified call with missing parameters. Like:
SELECT __a_search_place('foo', 2, 3, 4);
SELECT __a_search_place('foo');
Related:
About the short ROW()
syntax for testing whether any value is NOT NULL
: