According to spec VARCHAR2(max_size CHAR)
should store max_size
chars. I observe other/strange behavior for Unicode texts.
Let's consider that example:
create table test (id varchar2(3 char) not null, primary key (id));
insert into test(id) values('abc');
insert into test(id) values('ффф');
Query 1 ERROR: ORA-12899: value too large for column "TEST"."TEST"."ID" (actual: 6, maximum: 3)
So varchar2 3 chars actually mean the same as byte? NO :)
create table test (id varchar2(3 byte) not null, primary key (id))
insert into test(id) values('abc')
insert into test(id) values('ффф')
Query 1 ERROR: ORA-12899: value too large for column "TEST"."TEST"."ID" (actual: 18, maximum: 3)
And my question remains how to tell Oracle that varchar2 length is for Unicode text (UTF8 to be more precise)?
Update: Is it possible to write down a SQL query that will show all tables/columns that length was in bytes?
Actually, my issue split into 2 parts incorrect query encoding of TablePlus, length in bytes (w/o char suffix) for random columns :)
Update 2: Thanks to @Wernfried Domscheit!
The query show table and columns with varchar2
that length is provided in byte
s:
SELECT TABLE_NAME, COLUMN_NAME, DATA_LENGTH, CHAR_USED
FROM USER_TAB_COLUMNS WHERE DATA_TYPE = 'VARCHAR2' AND CHAR_USED = 'B'
Your example is working for me:
SELECT *
FROM V$NLS_PARAMETERS
WHERE PARAMETER = 'NLS_CHARACTERSET';
PARAMETER VALUE
------------------------------
NLS_CHARACTERSET AL32UTF8
1 row selected.
CREATE TABLE TEST (ID VARCHAR2(3 CHAR));
Table created.
INSERT INTO TEST(ID) VALUES('abc');
1 row created.
INSERT INTO TEST(ID) VALUES('ффф');
1 row created.
Maybe a typo on your side?
Update:
Looks like your client uses wrong character settings.
ф (U+0444: Cyrillic Small Letter Ef) has these byte values:
+-------------------------------------------------------------------------------+
|Encoding|hex |dec (bytes)|dec |binary |
+-------------------------------------------------------------------------------+
|UTF-8 |D1 84 |209 132 |53636 |11010001 10000100 |
|UTF-16BE|04 44 |4 68 |1092 |00000100 01000100 |
|UTF-16LE|44 04 |68 4 |17412 |01000100 00000100 |
|UTF-32BE|00 00 04 44|0 0 4 68 |1092 |00000000 00000000 00000100 01000100|
|UTF-32LE|44 04 00 00|68 4 0 0 |1141112832|01000100 00000100 00000000 00000000|
+-------------------------------------------------------------------------------+
DUMP
should return Typ=1 Len=6 CharacterSet=AL32UTF8: d1,84,d1,84,d1,84
but you get ef,bf,bd
which is U+FFFD: Replacement Character
You don't insert ффф
, it is converted to ���
.
I guess actually your client uses UTF-8 but you did not tell the database, so most likely the database assumes the client uses default US7ASCII
(or something else). The client sends 6 Bytes (d1,84,d1,84,d1,84
) but the Oracle database interprets it as 6 Single-Byte characters.
Typically you use the NLS_LANG
environment variable to define this. However, dbeaver is Java based and Java/JDBC does not use the NLS_LANG
settings - at least not by default.