I have data in VARCHAR(MAX)
source, that I need to insert into another table as VARCHAR(N)
.
The collation I need to use is UTF8 (specifically Czech_100_CI_AS_SC_UTF8
).
The problem is, even if I cut down the data up to maximum permissible length (using LEFT
function, or SUBSTRING
, or whatever), the insert fails for some data, with
Msg 2628, Level 16, State 1, Line 25
String or binary data would be truncated in table 'tablename', column 'columnname'. Truncated value: 'sometext'.
My question is, how to properly shorten the text, so it will fit, but still keep as much of it as possible?
The actual length of the string will depend on characters used, for latin-only texts the full length can be used. If there are a few accented characters, that will shorten the usable length by a bit. If the text is fully Unicode (non-latin script, e.g. Japanese), the usable length might be cut in half, or third.
Steps to reproduce:
DROP TABLE IF EXISTS [#tmpdl]
GO
CREATE TABLE [#tmpdl] ([Txt] VARCHAR(10) COLLATE Czech_100_CI_AS_SC_UTF8 NULL)
GO
INSERT INTO [#tmpdl] ([Txt]) VALUES (LEFT('123456789abcd', 10)) --this works
GO
INSERT INTO [#tmpdl] ([Txt]) VALUES (LEFT('123456789ábcd', 10)) --this fails
GO
INSERT INTO [#tmpdl] ([Txt]) VALUES (LEFT(N'一二三四', 10)) --this also fails
GO
even if I cut down the data up to maximum permissible length (using
LEFT
function, orSUBSTRING
, or whatever), the insert fails for some data
With LEFT
or SUBSTRING
you are cutting it down to the specified number of characters, whereas the 10
in varchar(10)
specifies the number of bytes, so after your truncation with LEFT
or SUBSTRING
you are left with a string that may still have more bytes than 10 - which is the same, for the purpose of insertion, as if you never truncated it in the first place.
A simple way to truncate the string to a number of characters rather than bytes is to use CAST
or CONVERT
, because they cut the string before the surrogate pair that doesn't fit and keep the original collation:
INSERT INTO [#tmpdl] ([Txt]) VALUES (cast(N'123456789ábcd' COLLATE Czech_100_CI_AS_SC_UTF8 as varchar(10)));
INSERT INTO [#tmpdl] ([Txt]) VALUES (cast(N'一二三四' COLLATE Czech_100_CI_AS_SC_UTF8 as varchar(10)));
Do note that this requires SQL Server 2012 or later.