databaseutf-8db2db2-luw

DB2 UTF8 storage configuration


I have DB2 database, and I have already defined tables with varchar(x), but it seems that x - is not the number of characters but the number of bytes. Is it possible to fix the issue somehow dynamically? or should I update my tables?


Solution

  • UTF-8 characters can be represented by byte sequences that are 1 to 4 bytes long, so it is impossible to know in advance the maximum byte length of a UTF-8 string that contains x Unicode characters.

    Beginning with DB2 10.5 fix pack 4 (the so called "Cancun release") you can specify string length units when defining columns: VARCHAR(x OCTETS) would mean x bytes; VARCHAR(x CODEUNITS16) would mean x two-byte characters, and VARCHAR(x CODEUNITS32) would mean x four-byte characters.

    If you don't specify units explicitly, the default is derived from the string_units database configuration parameter and from the NLS_STRING_UNITS global variable.

    GRAPHIC and VARGRAPHIC data types are also subject to the same unit specification, although the default unit for them is 2-byte characters, not octets.

    Note that character and CLOB columns are still subject to the same length limitations as before: for example, the maximum length of a VARCHAR column is 32672 bytes (OCTETS), or 16336 CODEUNITS16, or 8168 CODEUNITS32.