sql-serverbulkinsertbulk

BULK INSERT fails - tried many variations on row delimiter


I am struggling to script a BULK INSERT for my .dat files. I am getting many errors depending on what I try, but I'm pretty sure it's related to the row delimiter. I can successfully use the SQL import wizard for my .dat files. Under "Specify the characters that delimit the source file", I see the following as defaults that I accept:

Row delimiter: {CR}{LF}   
Column delimiter: Comma {,}

The 500k rows and 158 columns come in just beautifully thanks to the wizard.

When scripting, I have tried all manner of \r\n variations and 0x0a etc and searched for help and reviewed similar posts. I can successfully script in a single line of real data, so I don’t think the column parsing is an issue. It fails when I try 2 lines in a dummy file. The least amount of errors I generate occurs with the following

bulk insert abc
from 'C:\TestDat\try2.dat'
with (FIRSTROW = 1, fieldterminator = ',', rowterminator = '\r\n')
go

Msg 4863, Level 16, State 1, Line 3
Bulk load data conversion error (truncation) for row 1, column 158 (Column 157).

(using 0x0a will create more errors on a 2 line file).

.dat contents:

xxx,20080501,xxx,xxx:175000,55008654,0178636,202307011001,001859915,OK,W,xx,1.0,00000100,48,202306290100,R0,2,202306290200,R0,0,202306290300,R0,0,202306290400,R0,0,202306290500,R0,1,202306290600,R0,1,202306290700,R0,0,202306290800,R0,0,202306290900,R0,0,202306291000,R0,0,202306291100,R0,0,202306291200,R0,1,202306291300,R0,0,202306291400,R0,0,202306291500,R0,2,202306291600,R0,1,202306291700,R0,0,202306291800,R0,0,202306291900,R0,0,202306292000,R0,0,202306292100,R0,0,202306292200,R0,0,202306292300,R0,2,202306300000,R0,3,202306300100,R0,0,202306300200,R0,0,202306300300,R0,2,202306300400,R0,0,202306300500,R0,1,202306300600,R0,1,202306300700,R0,3,202306300800,R0,0,202306300900,R0,0,202306301000,R0,0,202306301100,R0,0,202306301200,R0,3,202306301300,R0,1,202306301400,R0,0,202306301500,R0,1,202306301600,R0,1,202306301700,R0,0,202306301800,R0,0,202306301900,R0,1,202306302000,R0,0,202306302100,R0,1,202306302200,R0,0,202306302300,R0,1,202307010000,R0,0
xxx,20080501,xxx,xxx:175000,55008654,0178636,202307011001,001859915,OK,W,xx,1.0,00000100,4,202307010100,R0,1,202307010200,R0,0,202307010300,R0,1,202307010400,R0,0

Solution

  • Ok, I created the try2.dat file,

    and i created a table abc using (I left out column3..column157, but they have same definition....);

    CREATE TABLE abc 
    (
        Column1 VARCHAR(200),
        Column2 VARCHAR(200),
        ...
        Column158 VARCHAR(200),
        Column159 VARCHAR(200)
    )
    

    Then started SQLCMD

    D:\TEMP>sqlcmd
    1> use test
    2> go
    Changed database context to 'TEST'.
    1> delete from abc;
    2>
    3> go
    
    (1 rows affected)
    1>
    2> bulk insert abc from 'd:\temp\try2.dat' with (FIRSTROW = 1, fieldterminator = ',', rowterminator = '\r\n')
    3> go
    
    (1 rows affected)
    

    When you are getting an error, I see two possibilities:

    When selecting the table in SSMS, I do see:

    [![start of the row][1]][1]

    [![end of the row][2]][2]

    The last column does not seem to be OK...... More investigation needed.... πŸ€”πŸ€”πŸ˜•

    A workaround could be:

    Create a table like this:

    CREATE TABLE def
    (
        Column1 VARCHAR(MAX)
    );
    

    and import using (since you do not have # characters in your file):

    bulk insert def from 'd:\temp\try2.dat' 
    with (FIRSTROW = 1, fieldterminator = '#', rowterminator = '\r\n')
    

    After that, you can do:

    SELECT * 
    FROM STRING_SPLIT((SELECT Column1 FROM def), ',');
    

    Resulting in:

    value
    xxx
    20080501
    xxx
    xxx:175000
    55008654
    0178636
    202307011001
    001859915
    OK
    W
    xx
    1.0
    00000100
    48
    202306290100
    R0
    ........

    NOTE: When your version of SQL is new enough, you can add the enable_ordinal parameter to STRING_SPLIT

    NOTE2: Why is this enable_ordinal added to STRING_SPLIT?

    Because in the old (or current 😒) way it works, STRING SPLIT does not "care" about ordinal position (https://stackoverflow.com/a/64092709/724039)

    I created a DBFIDDLE with your data, this has a line number added to def

    Part of the SELECT R, value FROM def d CROSS APPLY STRING_SPLIT((SELECT Column1 ), ','); (near the line change) is:

    R value
    ...
    1 1
    1 202307010400
    1 R0
    1 0
    2 xxx
    2 20080501
    2 xxx
    2 xxx:175000
    ...

    Unfortunately it is not possible (AFAIK) to add row number per line number in a reliable way...

    ANOTHER ATTEMPT:

    seeing: "The column count per row will equal 14+n*3 where n = the value in the 14th column."

    Define your import table (here named with the clumsy name def2:

    CREATE TABLE def2 (
      Column1 VARCHAR(200),
      Column2 VARCHAR(200),
      Column3 VARCHAR(200),
      Column4 VARCHAR(200),
      Column5 VARCHAR(200),
      Column6 VARCHAR(200),
      Column7 VARCHAR(200),
      Column8 VARCHAR(200),
      Column9 VARCHAR(200),
      Column10 VARCHAR(200),
      Column11 VARCHAR(200),
      Column12 VARCHAR(200),
      Column13 VARCHAR(200),
      Column14 VARCHAR(200),
      ColumnRest VARCHAR(MAX)
    );
    

    Import your data:

    bulk insert def2 from 'D:\Temp\try2.dat' with (fieldterminator = ',');
    

    Finally get the values, which can be used to be inserted in a (new) table:

    WITH def2tmp AS (
       SELECT ROW_NUMBER() OVER (ORDER BY ColumnRest) as R, Column14, RTRIM(ColumnRest) + ',' as ColumnRest
       FROM def2
    ), PosComma AS (
       SELECT 
           R , 
           Column14, 
           CHARINDEX(',',ColumnRest,1) as R2, 
           SUBSTRING(ColumnRest,1,CHARINDEX(',',ColumnRest,1)-1) as value
           ,ColumnRest
       FROM def2tmp
       UNION ALL
       SELECT 
           R, 
           Column14, 
           CHARINDEX(',',ColumnRest,R2+1), 
           SUBSTRING(ColumnRest,R2+1,CASE WHEN (CHARINDEX(',',ColumnRest,R2+1)-R2)-1 < 0 THEN 100 ELSE (CHARINDEX(',',ColumnRest,R2+1)-R2)-1 END)
           ,ColumnRest
       FROM PosComma
       WHERE R2<LEN(ColumnRest) and r<=2
    )
    SELECt 
       R, 
       R2, 
       ROW_NUMBER() OVER (PARTITION BY R ORDER BY R2) as R3,
       value as value 
    FROM PosComma
    ORDER BY column14 DESC,R2 
    OPTION (MAXRECURSION 20000)
    ;
    

    output (where R is the row number from your import table, and R3 is the row number of the values (R2 is the position of the , in ColumnRest)):

    P.S. SORRY for this messy output.... 😁

    This solution gets you two tables, the first def2 with the first 14 columns, the second with the values in that row (but you have to define that table yourself!)

    R R2 R3 value
    1 13 1 202306290100
    1 16 2 R0
    1 18 3 2
    1 31 4 202306290200
    1 34 5 R0
    1 36 6 0
    1 49 7 202306290300
    1 52 8 R0
    1 54 9 0
    1 67 10 202306290400
    1 70 11 R0
    1 72 12 0
    1 85 13 202306290500
    1 ... ... ......
    1 823 136 202306302200
    1 826 137 R0
    1 828 138 0
    1 841 139 202306302300
    1 844 140 R0
    1 846 141 1
    1 859 142 202307010000
    1 862 143 R0
    1 864 144 0
    2 13 1 202307010100
    2 16 2 R0
    2 18 3 1
    2 31 4 202307010200
    2 34 5 R0
    2 36 6 0
    2 49 7 202307010300
    2 52 8 R0
    2 54 9 1
    2 67 10 202307010400
    2 70 11 R0
    2 72 12 0