sql-serverbulkinsertbcp

SQL Server BCP Bulk insert Pipe delimited with text qualifier format file


I have a csv file which is vertical pipe delimited with every column also with a text qualifier of ".

I have been trying for ages to try and get the BCP format file to work, but no luck.

I have the following staging table:

[ID] [VARCHAR](100) NULL,
[SUB_ID] [NUMERIC](18, 0) NULL,
[CODE1] [VARCHAR](20) NULL,
[CODE2] [NUMERIC](18, 0) NULL,
[DATE] [DATE] NULL

Data in csv:

"ID"|"SUB_ID"|"CODE1"|"CODE2"|"DATE"
"HAJHD87SADAD9A87SD9ADAS978DAA89D09AS"|"7510"|"N04FY-1"|"359420013"|"08/08/2018"

Format file:

14.0
5
1   SQLCHAR   0  0   '"|"'  1  ID      ""
2   SQLCHAR   0  0   '"|"'  2  SUB_ID  ""
3   SQLCHAR   0  0   '"|"'  3  CODE1   SQL_Latin1_General_CP1_CI_AS
4   SQLCHAR   0  0   '"|"'  4  CODE2   ""
5   SQLCHAR   0  0   '"\n"' 5  DATE    ""

When I try to execute using the following SQL statement:

BULK INSERT [dbo].[TEST]
FROM 'G:\DATA\TABLE.csv'  
WITH (FIRSTROW = 2,
      FORMATFILE = 'G:\DATA\TEST.fmt')

I get this error

Msg 4866, Level 16, State 8, Line 1
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.

Msg 7301, Level 16, State 2, Line 1
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".

I can't work out where it is going wrong. Is it a data type mismatch or are my FIELDTERMINATOR and ROWTERMINATOR incorrect? Any ideas will be greatly received, I've tried so many combinations.


Solution

  • First, the BCP program only recognizes the doublequote as the container of the delimiter. So, using the single quote is causing an error.

    Second, since the delimiter you want to specify "|" includes the doublequote character that BCP requires you to use to use to enclose your delimiter, you have to use the escape character to get the BCP program to ignore the quotes you want to use as delimiters. The escape character is the backslash character. So...

    Instead of "|"... use... "\"|\""

    This will tell BCP to ignore the doublequots preceded by a backslash and just treat them as any other character.

    Third, you must account for the first field that has a preceding doublequote. The "|" terminator I mentioned above wont account for the opening doublequote at the beginning of each line for the first field.

    To handle that you must add a "dummy" field to your format file and assign it's terminator as \" (or actually "\"" in the format file). Then, since you now have one more field in the file than you have in the table, you must offset your column numbering to tell BCP skip this new field that is terminated by the first doublequote in the file

    Last, the last field is not terminated by just a newline character "\n". It is also termed by a doublequote (no pipe character included). So, we must customimze the final field terminator (which is actually the line/row terminator). Like this "\"\n".

    Your format file will now look like this:

    14.0
    5
    1   SQLCHAR   0  0   "\""  0  dummy_field   ""
    2   SQLCHAR   0  0   "\"|\""  1  ID      ""
    3   SQLCHAR   0  0   "\"|\""  2  SUB_ID  ""
    4   SQLCHAR   0  0   "\"|\""  3  CODE1   SQL_Latin1_General_CP1_CI_AS
    5   SQLCHAR   0  0   "\"|\""  4  CODE2   ""
    6   SQLCHAR   0  0   "\"\n" 5  DATE    ""
    

    I hope that helps.