I have an windows form app that takes CSV/Excel files, lets users import them to a SQL DB.
Recently, I have had a weird bug where users upload a CSV file and it will truncate the text in a column.
Here is the source file:
Here is the DatagridView in my app, displaying the same data after it is converted to a datatable:
Notice some values are completely blank and highlighted in red, while other seem to be truncated. However, this only happens with CSV files, not excel. This leads me to believe it may be a driver issue.
Here is the code that converts the flat file data to a datatable:
Private Function ConvertCSVToDataTable(ByVal path As String) As DataTable
Using con As OleDb.OleDbConnection = New OleDb.OleDbConnection()
Try
If System.IO.Path.GetExtension(path) = ".csv" Then
con.ConnectionString = String.Format("Provider={0};Data Source={1};Extended Properties=""Text;HDR=YES;FMT=Delimited""", "Microsoft.Jet.OLEDB.4.0", IO.Path.GetDirectoryName(path))
Using cmd As OleDb.OleDbCommand = New OleDb.OleDbCommand("SELECT * FROM [" & IO.Path.GetFileName(path) & "]", con)
Using da As OleDb.OleDbDataAdapter = New OleDb.OleDbDataAdapter(cmd)
con.Open()
da.Fill(dt)
con.Close()
End Using
End Using
ElseIf System.IO.Path.GetExtension(path) = ".xlsx" Then
con.ConnectionString = String.Format("Provider={0};Data Source={1};Extended Properties=""Excel 12.0 Xml;HDR=Yes;IMEX=1""", "Microsoft.ACE.OLEDB.12.0", path)
con.Open()
Dim dbSchema As DataTable = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing)
Dim firstSheetname As String = dbSchema.Rows(0)("TABLE_NAME").ToString
Using cmd As OleDb.OleDbCommand = New OleDb.OleDbCommand("SELECT * FROM [" & firstSheetname & "]", con)
Using da As OleDb.OleDbDataAdapter = New OleDb.OleDbDataAdapter(cmd)
'con.Open()
da.Fill(dt)
con.Close()
End Using
End Using
End If
Catch ex As Exception
MessageBox.Show(ex.ToString(), "Conversion Error", MessageBoxButtons.OK, MessageBoxIcon.Asterisk)
Finally
If con IsNot Nothing AndAlso con.State = ConnectionState.Open Then
con.Close()
End If
End Try
End Using
Return dt
End Function
Any idea whats causing this? I should also note that some of my users can only import CSV and not Excel. I figured out users with a 32 bit Access driver could import excel files, while 64 bit users could not. I had them download the driver here:
https://www.microsoft.com/en-us/download/details.aspx?id=23734
Now those users can import Excel, but they still have the issue with the strings being truncated. Which leads me to believe it may still be a driver issue.
Test Data:
Sales Order #
US00123
US00123
US00123
SG0000123
SG0000123
S00123
S00123
S00123
S00123
S00123
I have been able to reproduce your issue.
The problem is that for some reason, the Text File Driver used by the Jet provider is interpreting/parsing "S00123" as a numeric value. I had to configure the registry with MaxScanRows=0 and ImportMixedTypes="Majority Type".
I do not know why this is happening, but I just attribute it to another one of the joys of using a poorly documented technology. It appears that any field that starts with a "S" followed by digits is interpreted as a number.
If you insist on using this technolgy to accomplish a task for which there are far better options, then you have to live with its many shortcommings and quirks.
Solution 1:
Configure the provider to import with registry value ImportMixedTypes set to "Text". Now if you make this change to allow your program to work, you will also be responsible for breaking any other code that is reliant on the existing configuration.
Location of the registry values on a 64 bit OS.
For Jet Provider:
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Jet\4.0\Engines\Text
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Jet\4.0\Engines\Excel
For ACE Provider:
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Text
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel
Solution 2:
Use a [Schema.ini]file https://learn.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver) to define how the text file is to be interpreted with defined columns.
This is the preferred method for using this technology. Be advised that encoding of the file is very important; it must either in the .Net encoding System.Text.Encoding.ASCII
or System.Text.Encoding.Unicode
. If you use another encoding such as UTF-8
, the file will not be read and the settings in the registry will be used.