I want to have a function in Redshift that removes accents from words. I have found a question in SO(question) with the code in Python for making it. I have tried a few solutions, one of them being:
import unicodedata
def remove_accents(accented_string):
nfkd_form = unicodedata.normalize('NFKD', input_str)
return u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
Then I create the function in Redshift as follows:
create function remove_accents(accented_string varchar)
returns varchar
immutable
as $$
import unicodedata
def remove_accents(accented_string):
nfkd_form = unicodedata.normalize('NFKD', input_str)
return u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
$$ language plpythonu;
And I apply it to a column with:
SELECT remove_accents(city) FROM info_geo
Getting just null values. The column city is of varchar type. Why am I getting null values and how could I solve it?
You don't need to create a Python function inside the UDF. Either add a call of the function or write it as a scalar expression:
create function remove_accents(accented_string varchar)
returns varchar
immutable
as $$
import unicodedata
nfkd_form = unicodedata.normalize('NFKD', accented_string)
return u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
$$ language plpythonu;