My regexp_replace
does not remove Umlauts as expected. I thought this was because of the collation but my test did show differently. So, as far as I understand it, ü
should not be part of [a-zA-Z0-9]
. Apparently it is though:
select SYS_CONTEXT('USERENV','NLS_SORT') from dual;
select decode('ü','u',0,1) from dual;
select regexp_replace('ABcdäü~~~--- Ø asdsad 123 /() ´´´', '[^a-zA-Z0-9]', '') x from dual;
begin
if 'ü' > 'z' then
dbms_output.PUT_LINE(1);
else
dbms_output.PUT_LINE(0);
end if;
if 'ü' > 'Z' then
dbms_output.PUT_LINE(1);
else
dbms_output.PUT_LINE(0);
end if;
if 'ü' > '9' then
dbms_output.PUT_LINE(1);
else
dbms_output.PUT_LINE(0);
end if;
end;
Results:
German
1
ABcdäüØasdsad123
1
1
1
So as far as I understand it ü should not be part
It depends from nls_session_parameters
. There are two params:
SELECT * FROM nls_session_parameters WHERE parameter IN ('NLS_COMP', 'NLS_SORT');
ANSI
, BINARY
, LINGUISTIC
)BINARY
, e.g. GERMAN
, GERMAN_CI
, GERMAN_AI
) - _CI
means case insensitive, _AI
means accent insensitiveBecause in your session NLS_SORT
is set to GERMAN
the letter ü
is between a
and z
.
alter session set NLS_SORT=BINARY
...and the regexp will remove Umlauts.