[SOLVED] regexp_replace() - maybe collation issue

regexp_replace() - maybe collation issue

My regexp_replace does not remove Umlauts as expected. I thought this was because of the collation but my test did show differently. So, as far as I understand it, ü should not be part of [a-zA-Z0-9]. Apparently it is though:

select SYS_CONTEXT('USERENV','NLS_SORT') from dual;
select decode('ü','u',0,1) from dual;
select regexp_replace('ABcdäü~~~---  Ø asdsad 123 /() ´´´', '[^a-zA-Z0-9]', '') x from dual;

begin

  if 'ü' > 'z' then
    dbms_output.PUT_LINE(1);
  else
    dbms_output.PUT_LINE(0);
  end if;
        
  if 'ü' > 'Z' then
    dbms_output.PUT_LINE(1);
  else
    dbms_output.PUT_LINE(0);
  end if;
        
  if 'ü' > '9' then
    dbms_output.PUT_LINE(1);
  else
    dbms_output.PUT_LINE(0);
  end if;

end;

Results:

German
1
ABcdäüØasdsad123
1
1
1

Solution

So as far as I understand it ü should not be part

It depends from nls_session_parameters. There are two params:

SELECT * FROM nls_session_parameters WHERE parameter IN ('NLS_COMP', 'NLS_SORT');

NLS_COMP = how to compare characters (ANSI, BINARY, LINGUISTIC)
NLS_SORT = order of characters (BINARY, e.g. GERMAN, GERMAN_CI, GERMAN_AI) - _CI means case insensitive, _AI means accent insensitive

Because in your session NLS_SORT is set to GERMAN the letter ü is between a and z.

alter session set NLS_SORT=BINARY

...and the regexp will remove Umlauts.