sql-servercsvencodingbcpdata-transfer

"Unexpected EOF encountered in BCP data-file" when importing large CSV exported from SQLite


I'm facing a persistent issue when trying to import a .csv file into SQL Server using bcp. The operation always fails with the following error:

SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file

The CSV was exported from a SQLite database table with approximately 65 million rows and treated using a Python script to normalize the data in it.

Due to data sensitivity, I cannot share real rows, but here's a fictitious example that represents the real structure and data types (column names are in portuguese, don't mind):

12345678|EMPRESA TESTE LTDA|2062|49|01|nan|1000.0

Which corresponds to:

| Column                        | Example            | SQL Server Type           |
| ----------------------------- | ------------------ | ------------------------- |
| `cnpj_basico`                 | 12345678           | `INT`              |
| `razao_social`                | EMPRESA TESTE LTDA | `VARCHAR(MAX)`           |
| `natureza_juridica`           | 2062               | `SMALLINT`              |
| `qualificacao_responsavel`    | 49                 | `SMALLINT`              |
| `porte_empresa`               | 01                 | `VARCHAR(2)`              |
| `ente_federativo_responsavel` | nan                | `VARCHAR(100)` (nullable) |
| `capital_social`              | 1000.0             | `DECIMAL(18,2)`           |

The first line in the file is the header.

What I’ve already checked:

  1. Verified the file encoding (utf-8, without BOM).
  2. Generated the normalized CSV with pandas.to_csv() using:
sep='|', lineterminator='\r\n', quoting=csv.QUOTE_NONE, na_rep=''
  1. Validated that all lines have the same number of columns as the header.

  2. BCP command used (real parameters have been replaced):

bcp test.table in "file.csv" -S {SERVERNAME} -d {DATABASE} -U {USER} -P {PWD} -c -t "|" -r "\r\n" -F 2 -q -e C:\error_file.txt
  1. Verified that the file ends with \r\n and no incomplete lines.

  2. Validated that the table has exactly 7 columns, matching the CSV.

(By the way, even after using

chunk.fillna('', inplace=True)

Pandas still exports some empty fields as "nan" strings. This may be part of the issue if BCP is interpreting them as literal strings instead of empty fields.)

So my final question is:

What else could cause this "Unexpected EOF" in BCP even after addressing encoding, delimiters, headers, row terminators, and missing value treatment? Could encoding be the problem here, even though I've checked and it most likely is UTF-8?

Has anyone dealt with this error in large-scale CSV loads? Could BCP be mishandling quoted nan values or line terminators under certain conditions?

And finally, does anyone know any other method that would allow me to transfer huge databases safely from SQLite to Azure SQL Server? I've tried with DB Browser and Azure Import Wizard as well, but obviously to no avail.

Any guidance would be highly appreciated.

Additional details I've seen people asking in similar questions:
1. bcp.exe version: 16.0.1000.6
2. I cannot get the file to open in any text editor, since its size is ~4GB and instantly crashes any editor I try.
3. I've tried both comma and pipeline separators.
4. I've checked if it is UTF-8 by reading the first 3 bytes AND with chardet.
5. Test sample can be created using this script, which imitates the file I currently have:

import csv

with open('sample.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f, delimiter='|', quoting=csv.QUOTE_NONE, escapechar='\\')

    # Header
    writer.writerow([
        'cnpj_basico',
        'razao_social',
        'natureza_juridica',
        'qualificacao_responsavel',
        'porte_empresa',
        'ente_federativo_responsavel',
        'capital_social'
    ])

    # 100 simulated rows
    for i in range(100):
        writer.writerow([
            f'{10000000 + i}',
            f'EMPRESA {chr(65 + (i % 26))}',   # *
            '2062',
            '49',
            '01',
            'nan',  # this is where it is NaN and is being converted into string 'nan'
            f'{1000.0 + i * 10}'
        ])

* note that that is the only field that really should be a text and could cause problems with commas and quotations, but it is nearly impossible to check all 65 million rows. Other text fields are being put as such as a precaution.


Solution

  • Usually BCP raises EOF reached because you haven't specified correct row separator. In your case, "\r\n" is not correct, because bcp uses \n specifier to mean "\r\n" - probably because it was windows only originally.

    Luckily, by default it uses \r\n separator already, so by completely omitting -r argument, you can import your file with \r\n newlines.

    It's all explained here https://learn.microsoft.com/en-us/sql/relational-databases/import-export/specify-field-and-row-terminators-sql-server?view=sql-server-ver16#specify-n-as-a-row-terminator-for-bulk-import albeit, a bit confusingly