I wrote a short function with Oledb which should read large amount of data but there are still several issues I couldnt't solve, it's about reading and inserting semicolon separated data in sql database
Private Function GetCSVFile(ByVal file As String) As DataTable
Try
Dim dt As New DataTable
Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited"""
Dim conn As New OleDb.OleDbConnection(ConStr)
Dim da As New OleDb.OleDbDataAdapter("Select * from " & _table & ".csv", conn)
da.Fill(dt)
Application.DoEvents()
getData = dt
Catch ex As OleDbException
MessageBox.Show(ex.Message)
Catch ex As Exception
MessageBox.Show(ex.Message)
End Try
Return getData
End Function
1. it reads whole file but i need to tell the function that it should read only 50.000 rows and pass them to another function step for step, it should probably better work in for loop because jet oledb doesn't read files larger than 1 GB
value(i) = value(i).Replace("\t", Constants.vbTab).Replace("\n", Constants.vbLf).Replace("\r", Constants.vbCr).Replace("\""", """").Replace("\\", "\")
but it usually works only for strings
I need to recognize data types like integer, double, string etc. my first idea was to do it over sql query and check the tables over tryparse
Dim dtInserts As DataTable = db.GetDataTable("SELECT TOP 0 * FROM " & _table) Dim ListOfTypes As New List(Of System.Type) For Each _col As DataColumn In dtInserts.Columns Dim _type As System.Type = _col.DataType ListOfTypes.Add(_type) Next Dim _wert1 As String = "11.11.2011" Dim _type1 As System.Type = ListOfTypes.Item(1) If DateTime.TryParse(_wert1, New Date) Then End If
but still not sure if it will work
All data that have been read should be encoded in 1252 Codepage. this one doesn't really work
Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited;CODEPAGE=1252"""
anyone idea what could be done?
CSVHelper can do most of what it sounds like you want. FileHelpers might also work, I just haven't worked with it as much. I wouldn't recommend VB's TextFieldParser
because it returns a string array rather than typed data.
OleDB
provides a great way to import, but saving to the database presents a small challenge. All the loaded rows will have a RowState
of Unchanged
. The only way to change that to Added
is to copy the rows into a new table:
For Each dr As DataRow In dtCSV.Rows
dtDest.Rows.Add(dr.ItemArray)
Next
It will work, but as a result, you will have 2 tables and all those rows loaded at once. It turns out that using CSVHelper and a simple INSERT query is the most economical - since the records are fetched from an IEnumerable<T>
only 1 source record will be loaded at a time. It is also a bit faster than copying rows: about 20% faster on 500k rows.
Note: We have no idea whatsoever what the data looks like other than it is semicolon delimited...and there is apparently a lot of it.
Using sr As New StreamReader(CSVFilePath, False),
csv As New CsvReader(sr)
' some CSVHelper config options
csv.Configuration.HasHeaderRecord = True
csv.Configuration.TrimFields = True
csv.Configuration.TrimHeaders = True
csv.Configuration.Delimiter = ";"
csv.Configuration.IsHeaderCaseSensitive = False
csv.Configuration.RegisterClassMap(Of RandItem.RandItemMap)()
' get the file into IEnumerable collection
Dim csvData = csv.GetRecords(Of OleImportItem)()
Dim SQL = <sql>
INSERT INTO RandomData
(Foo, Bar, Cat, Dog...)
VALUES
(@p1, @p2, @p3, @p4...)
</sql>.Value
Using dbcon As New OleDbConnection(ACEConnStr)
Using cmd As New OleDbCommand(SQL, dbcon)
dbcon.Open()
' create the parameters
cmd.Parameters.Add("@p1", OleDbType.VarChar)
cmd.Parameters.Add("@p2", OleDbType.VarChar)
cmd.Parameters.Add("@p3", OleDbType.Integer)
cmd.Parameters.Add("@p4", OleDbType.Integer)
...
' load one item at a time, to save it
For Each item In csvData
cmd.Parameters("@p1").Value = item.Foo
cmd.Parameters("@p2").Value = item.Bar
cmd.Parameters("@p3").Value = item.Cat
cmd.Parameters("@p4").Value = item.Dog
...
cmd.ExecuteNonQuery()
Next
End Using
End Using
End Using
I cant give a complete tutorial on how to use it, but in general you create a type (Class) which defines the datatype for each column (here,RandItem
), the RandItemMap
is another class which specifies the order of those properties in the file. In so doing, CSVHelper knows the datatype of each column and will convert for you.
There are several ways to use it, this way is reading one line at a time from the file and immediately saves that item to the database: csvData = csv.GetRecords(Of OleImportItem)()
initializes csvData
as an IEnumerable(Of RandItem)
, so only one item is loaded at a time in the loop which makes it very economical.
In the loop, the code gets a typed item
from the csv file which is then mapped to the respective parameters and saved. This is about 20% faster than copying the data to a DataTable in batches and saving; and much less memory intensive since there is only ever 1 item loaded.
For smaller csv files, you could use .ToArray()
or ToList()
to load the file to a collection.
Note Importing/parsing a text file without headers is a little different than shown, but just as easy.
It sounds like CSVHelper might obviate most issues in your laundry list.