Reading from CSV files using SQL Server 2016 (no format option available).
The files are crlf/windows and each value will be wrapped in double quotes. Files are located here, note that I changed them to crlf for consistency.
Every now and then specifically I see this often with numbers, when we attempt to do a replace(value, '"', '') and then cast to int, it will fail with error below
Conversion failed when converting the nvarchar value '331877 ' to data type int.
But if we do a substring() it produces a value that cannot be cast to int.
Super hacky IMHO because you need to guess at the length of the string using len().
I have verified that the results of both replace and substring look exactly alike, but when I run a
nullif(replace(value, '"', ''), substring(value, 2, len(value)-2))
it claims they are different somehow. (crosses eyes) They look the same... they are the same type... they are the same, but nullif recognizes that something is wrong. My code is below. Perhaps there is something with the coalition??
bulk insert #tmp
From 'C:\acquisition_samples.csv'
WITH
(
CODEPAGE = '65001'
,FIRSTROW = 2
,FIELDTERMINATOR = ','
,ROWTERMINATOR = '0x0A'
,batchsize=10
,TABLOCK
);
-- DDL
insert into acquisition_sample(fdc_id_of_sample_food, fdc_id_of_acquisition_food) -- UPDATE file name, and columns
select
nullif(REPLACE(t.fdc_id_of_sample_food, '"', ''), substring(t.fdc_id_of_sample_food,2,LEN(t.fdc_id_of_sample_food)-2))
, nullif(REPLACE(t.fdc_id_of_acquisition_food, '"', ''), substring(t.fdc_id_of_acquisition_food,2,LEN(t.fdc_id_of_acquisition_food)-2)) as wtf
, CAST(substring(t.fdc_id_of_sample_food,2,LEN(t.fdc_id_of_sample_food)-2) AS INT)
, t.fdc_id_of_acquisition_food
, CAST(substring(t.fdc_id_of_acquisition_food,2,LEN(t.fdc_id_of_acquisition_food)-3) AS INT)
from #tmp t
Thanks to the hints from @AlwaysLearning and @T N
This now becomes obvious that the reason why we get a different value is that substring was removing the extra character, but replace was only targeting the double quotes I fed to it.
UNICODE(RIGHT(value, 1))
REPLACE(Fieldname, CHAR(13), '')
This now becomes obvious that the reason why we get a different value is that substring was removing the extra character, but replace was only targeting the double quotes I fed to it.
I realized that the reason the text at the end of the line is pushed down by one line in the error console is that it's a carriage return.
This similar answer also helped. CSV Carriage Return Character