I am struggling to script a BULK INSERT
for my .dat
files. I am getting many errors depending on what I try, but I'm pretty sure it's related to the row delimiter. I can successfully use the SQL import wizard for my .dat
files. Under "Specify the characters that delimit the source file", I see the following as defaults that I accept:
Row delimiter: {CR}{LF}
Column delimiter: Comma {,}
The 500k rows and 158 columns come in just beautifully thanks to the wizard.
When scripting, I have tried all manner of \r\n
variations and 0x0a
etc and searched for help and reviewed similar posts. I can successfully script in a single line of real data, so I donβt think the column parsing is an issue. It fails when I try 2 lines in a dummy file. The least amount of errors I generate occurs with the following
bulk insert abc
from 'C:\TestDat\try2.dat'
with (FIRSTROW = 1, fieldterminator = ',', rowterminator = '\r\n')
go
Msg 4863, Level 16, State 1, Line 3
Bulk load data conversion error (truncation) for row 1, column 158 (Column 157).
(using 0x0a will create more errors on a 2 line file).
.dat contents:
xxx,20080501,xxx,xxx:175000,55008654,0178636,202307011001,001859915,OK,W,xx,1.0,00000100,48,202306290100,R0,2,202306290200,R0,0,202306290300,R0,0,202306290400,R0,0,202306290500,R0,1,202306290600,R0,1,202306290700,R0,0,202306290800,R0,0,202306290900,R0,0,202306291000,R0,0,202306291100,R0,0,202306291200,R0,1,202306291300,R0,0,202306291400,R0,0,202306291500,R0,2,202306291600,R0,1,202306291700,R0,0,202306291800,R0,0,202306291900,R0,0,202306292000,R0,0,202306292100,R0,0,202306292200,R0,0,202306292300,R0,2,202306300000,R0,3,202306300100,R0,0,202306300200,R0,0,202306300300,R0,2,202306300400,R0,0,202306300500,R0,1,202306300600,R0,1,202306300700,R0,3,202306300800,R0,0,202306300900,R0,0,202306301000,R0,0,202306301100,R0,0,202306301200,R0,3,202306301300,R0,1,202306301400,R0,0,202306301500,R0,1,202306301600,R0,1,202306301700,R0,0,202306301800,R0,0,202306301900,R0,1,202306302000,R0,0,202306302100,R0,1,202306302200,R0,0,202306302300,R0,1,202307010000,R0,0
xxx,20080501,xxx,xxx:175000,55008654,0178636,202307011001,001859915,OK,W,xx,1.0,00000100,4,202307010100,R0,1,202307010200,R0,0,202307010300,R0,1,202307010400,R0,0
Ok, I created the try2.dat
file,
and i created a table abc
using (I left out column3..column157, but they have same definition....);
CREATE TABLE abc
(
Column1 VARCHAR(200),
Column2 VARCHAR(200),
...
Column158 VARCHAR(200),
Column159 VARCHAR(200)
)
Then started SQLCMD
D:\TEMP>sqlcmd
1> use test
2> go
Changed database context to 'TEST'.
1> delete from abc;
2>
3> go
(1 rows affected)
1>
2> bulk insert abc from 'd:\temp\try2.dat' with (FIRSTROW = 1, fieldterminator = ',', rowterminator = '\r\n')
3> go
(1 rows affected)
When you are getting an error, I see two possibilities:
When selecting the table in SSMS, I do see:
[![start of the row][1]][1]
[![end of the row][2]][2]
The last column does not seem to be OK...... More investigation needed.... π€π€π
A workaround could be:
Create a table like this:
CREATE TABLE def
(
Column1 VARCHAR(MAX)
);
and import using (since you do not have #
characters in your file):
bulk insert def from 'd:\temp\try2.dat'
with (FIRSTROW = 1, fieldterminator = '#', rowterminator = '\r\n')
After that, you can do:
SELECT *
FROM STRING_SPLIT((SELECT Column1 FROM def), ',');
Resulting in:
value |
---|
xxx |
20080501 |
xxx |
xxx:175000 |
55008654 |
0178636 |
202307011001 |
001859915 |
OK |
W |
xx |
1.0 |
00000100 |
48 |
202306290100 |
R0 |
........ |
NOTE: When your version of SQL is new enough, you can add the enable_ordinal
parameter to STRING_SPLIT
NOTE2: Why is this enable_ordinal
added to STRING_SPLIT?
Because in the old (or current π’) way it works, STRING SPLIT does not "care" about ordinal position (https://stackoverflow.com/a/64092709/724039)
I created a DBFIDDLE with your data, this has a line number added to def
Part of the SELECT R, value FROM def d CROSS APPLY STRING_SPLIT((SELECT Column1 ), ',');
(near the line change) is:
R | value |
---|---|
... | |
1 | 1 |
1 | 202307010400 |
1 | R0 |
1 | 0 |
2 | xxx |
2 | 20080501 |
2 | xxx |
2 | xxx:175000 |
... |
Unfortunately it is not possible (AFAIK) to add row number per line number in a reliable way...
ANOTHER ATTEMPT:
seeing: "The column count per row will equal 14+n*3 where n = the value in the 14th column."
Define your import table (here named with the clumsy name def2
:
CREATE TABLE def2 (
Column1 VARCHAR(200),
Column2 VARCHAR(200),
Column3 VARCHAR(200),
Column4 VARCHAR(200),
Column5 VARCHAR(200),
Column6 VARCHAR(200),
Column7 VARCHAR(200),
Column8 VARCHAR(200),
Column9 VARCHAR(200),
Column10 VARCHAR(200),
Column11 VARCHAR(200),
Column12 VARCHAR(200),
Column13 VARCHAR(200),
Column14 VARCHAR(200),
ColumnRest VARCHAR(MAX)
);
Import your data:
bulk insert def2 from 'D:\Temp\try2.dat' with (fieldterminator = ',');
Finally get the values, which can be used to be inserted in a (new) table:
WITH def2tmp AS (
SELECT ROW_NUMBER() OVER (ORDER BY ColumnRest) as R, Column14, RTRIM(ColumnRest) + ',' as ColumnRest
FROM def2
), PosComma AS (
SELECT
R ,
Column14,
CHARINDEX(',',ColumnRest,1) as R2,
SUBSTRING(ColumnRest,1,CHARINDEX(',',ColumnRest,1)-1) as value
,ColumnRest
FROM def2tmp
UNION ALL
SELECT
R,
Column14,
CHARINDEX(',',ColumnRest,R2+1),
SUBSTRING(ColumnRest,R2+1,CASE WHEN (CHARINDEX(',',ColumnRest,R2+1)-R2)-1 < 0 THEN 100 ELSE (CHARINDEX(',',ColumnRest,R2+1)-R2)-1 END)
,ColumnRest
FROM PosComma
WHERE R2<LEN(ColumnRest) and r<=2
)
SELECt
R,
R2,
ROW_NUMBER() OVER (PARTITION BY R ORDER BY R2) as R3,
value as value
FROM PosComma
ORDER BY column14 DESC,R2
OPTION (MAXRECURSION 20000)
;
output (where R
is the row number from your import table, and R3
is the row number of the values (R2
is the position of the ,
in ColumnRest
)):
P.S. SORRY for this messy output.... π
This solution gets you two tables, the first def2
with the first 14 columns, the second with the values in that row (but you have to define that table yourself!)
R | R2 | R3 | value |
---|---|---|---|
1 | 13 | 1 | 202306290100 |
1 | 16 | 2 | R0 |
1 | 18 | 3 | 2 |
1 | 31 | 4 | 202306290200 |
1 | 34 | 5 | R0 |
1 | 36 | 6 | 0 |
1 | 49 | 7 | 202306290300 |
1 | 52 | 8 | R0 |
1 | 54 | 9 | 0 |
1 | 67 | 10 | 202306290400 |
1 | 70 | 11 | R0 |
1 | 72 | 12 | 0 |
1 | 85 | 13 | 202306290500 |
1 | ... | ... | ...... |
1 | 823 | 136 | 202306302200 |
1 | 826 | 137 | R0 |
1 | 828 | 138 | 0 |
1 | 841 | 139 | 202306302300 |
1 | 844 | 140 | R0 |
1 | 846 | 141 | 1 |
1 | 859 | 142 | 202307010000 |
1 | 862 | 143 | R0 |
1 | 864 | 144 | 0 |
2 | 13 | 1 | 202307010100 |
2 | 16 | 2 | R0 |
2 | 18 | 3 | 1 |
2 | 31 | 4 | 202307010200 |
2 | 34 | 5 | R0 |
2 | 36 | 6 | 0 |
2 | 49 | 7 | 202307010300 |
2 | 52 | 8 | R0 |
2 | 54 | 9 | 1 |
2 | 67 | 10 | 202307010400 |
2 | 70 | 11 | R0 |
2 | 72 | 12 | 0 |