I have DB2 database, and I have already defined tables with varchar(x)
, but it seems that x
- is not the number of characters but the number of bytes. Is it possible to fix the issue somehow dynamically? or should I update my tables?
UTF-8 characters can be represented by byte sequences that are 1 to 4 bytes long, so it is impossible to know in advance the maximum byte length of a UTF-8 string that contains x
Unicode characters.
Beginning with DB2 10.5 fix pack 4 (the so called "Cancun release") you can specify string length units when defining columns: VARCHAR(x OCTETS)
would mean x bytes; VARCHAR(x CODEUNITS16)
would mean x two-byte characters, and VARCHAR(x CODEUNITS32)
would mean x four-byte characters.
If you don't specify units explicitly, the default is derived from the string_units
database configuration parameter and from the NLS_STRING_UNITS
global variable.
GRAPHIC
and VARGRAPHIC
data types are also subject to the same unit specification, although the default unit for them is 2-byte characters, not octets.
Note that character and CLOB columns are still subject to the same length limitations as before: for example, the maximum length of a VARCHAR
column is 32672 bytes (OCTETS
), or 16336 CODEUNITS16
, or 8168 CODEUNITS32
.