sqloracle-databaseunicodevarcharvarchar2

Oracle varchar2 and unicode texts


According to spec VARCHAR2(max_size CHAR) should store max_size chars. I observe other/strange behavior for Unicode texts.

Let's consider that example:

create table test (id varchar2(3 char) not null, primary key (id));

insert into test(id) values('abc');
insert into test(id) values('ффф');

Query 1 ERROR: ORA-12899: value too large for column "TEST"."TEST"."ID" (actual: 6, maximum: 3)

So varchar2 3 chars actually mean the same as byte? NO :)

create table test (id varchar2(3 byte) not null, primary key (id))

insert into test(id) values('abc')
insert into test(id) values('ффф')

Query 1 ERROR: ORA-12899: value too large for column "TEST"."TEST"."ID" (actual: 18, maximum: 3)

And my question remains how to tell Oracle that varchar2 length is for Unicode text (UTF8 to be more precise)?

Update: Is it possible to write down a SQL query that will show all tables/columns that length was in bytes?

Actually, my issue split into 2 parts incorrect query encoding of TablePlus, length in bytes (w/o char suffix) for random columns :)

Update 2: Thanks to @Wernfried Domscheit!

The query show table and columns with varchar2 that length is provided in bytes:

SELECT TABLE_NAME, COLUMN_NAME, DATA_LENGTH, CHAR_USED
FROM USER_TAB_COLUMNS WHERE DATA_TYPE = 'VARCHAR2' AND CHAR_USED = 'B'

Solution

  • Your example is working for me:

    SELECT * 
    FROM V$NLS_PARAMETERS 
    WHERE PARAMETER = 'NLS_CHARACTERSET';
    
    PARAMETER          VALUE      
    ------------------------------
    NLS_CHARACTERSET   AL32UTF8   
    
    1 row selected.
    
    
    
    CREATE TABLE TEST (ID VARCHAR2(3 CHAR));
    Table created.
    
    INSERT INTO TEST(ID) VALUES('abc');
    1 row created.
    
    INSERT INTO TEST(ID) VALUES('ффф');
    1 row created.
    

    Maybe a typo on your side?

    Update:

    Looks like your client uses wrong character settings.

    ф (U+0444: Cyrillic Small Letter Ef) has these byte values:

    +-------------------------------------------------------------------------------+
    |Encoding|hex        |dec (bytes)|dec       |binary                             |
    +-------------------------------------------------------------------------------+
    |UTF-8   |D1 84      |209 132    |53636     |11010001 10000100                  |
    |UTF-16BE|04 44      |4 68       |1092      |00000100 01000100                  |
    |UTF-16LE|44 04      |68 4       |17412     |01000100 00000100                  |
    |UTF-32BE|00 00 04 44|0 0 4 68   |1092      |00000000 00000000 00000100 01000100|
    |UTF-32LE|44 04 00 00|68 4 0 0   |1141112832|01000100 00000100 00000000 00000000|
    +-------------------------------------------------------------------------------+
    

    DUMP should return Typ=1 Len=6 CharacterSet=AL32UTF8: d1,84,d1,84,d1,84 but you get ef,bf,bd which is U+FFFD: Replacement Character

    You don't insert ффф, it is converted to ���.

    I guess actually your client uses UTF-8 but you did not tell the database, so most likely the database assumes the client uses default US7ASCII (or something else). The client sends 6 Bytes (d1,84,d1,84,d1,84) but the Oracle database interprets it as 6 Single-Byte characters.

    Typically you use the NLS_LANG environment variable to define this. However, dbeaver is Java based and Java/JDBC does not use the NLS_LANG settings - at least not by default.