sqlsql-serverdatalengthmultibyte-characters

DATALENGTH in SQL Server returns 1 BYTE for multibyte character


I need to check in SQL Server how many bytes are used by a value. According to the documentation the DATALENGTH() function should return values in bytes, but it seems that's not the case. Oracle and Postgres work as expected.

-- Oracle, 3 bytes
SELECT LENGTHB('和');    
-- Postgres, 3 bytes
SELECT OCTET_LENGTH('和'); 
**-- SQL Server, 1 byte
SELECT DATALENGTH('和');** 

Should I use another function?


Solution

  • Firstly, SELECT DATALENGTH('和'); returning 1 is correct. '和' is very likely outside the base codepage you are using, which means that you are effectively asking for the DATALENGTH of '?', and that does only consist of 1 byte.

    Presumably your Postgres and Oracle environments are in UTF-8, where the character takes up 3 bytes. In UCS-2/UTF-16, however. the character only takes up 2 bytes. When making these comparisons, you need to actually make the test fair; have the data type and code pages all be the same.

    If you compare the DATALENGTHs of your string as an ANSI varchar, UCS-2 nvarchar, and a UTF-8 varchar, you get the values 1, 2 and 3 respectively:

    SELECT DATALENGTH('和') AS [ANSI Length],
           DATALENGTH(N'和') AS [UCS-2 Length],
           DATALENGTH(CONVERT(varchar(4),N'和' COLLATE Latin1_General_100_CI_AI_WS_SC_UTF8)) AS [UTF-8 Length];