.netvb.netjet

Import to DataTable using OleDB/Jet


I wrote a short function with Oledb which should read large amount of data but there are still several issues I couldnt't solve, it's about reading and inserting semicolon separated data in sql database

Private Function GetCSVFile(ByVal file As String) As DataTable

        Try
            Dim dt As New DataTable
            Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited"""
            Dim conn As New OleDb.OleDbConnection(ConStr)

            Dim da As New OleDb.OleDbDataAdapter("Select * from " & _table & ".csv", conn)
            da.Fill(dt)
            Application.DoEvents()
            getData = dt
        Catch ex As OleDbException
            MessageBox.Show(ex.Message)
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
        Return getData
    End Function

1. it reads whole file but i need to tell the function that it should read only 50.000 rows and pass them to another function step for step, it should probably better work in for loop because jet oledb doesn't read files larger than 1 GB

  1. i need to replace characters
    value(i) = value(i).Replace("\t", Constants.vbTab).Replace("\n", Constants.vbLf).Replace("\r", Constants.vbCr).Replace("\""", """").Replace("\\", "\")

but it usually works only for strings

  1. I need to recognize data types like integer, double, string etc. my first idea was to do it over sql query and check the tables over tryparse

    
      Dim dtInserts As DataTable = db.GetDataTable("SELECT TOP 0 * FROM " & _table)
                Dim ListOfTypes As New List(Of System.Type)
         For Each _col As DataColumn In dtInserts.Columns
                    Dim _type As System.Type = _col.DataType
                    ListOfTypes.Add(_type)
                Next
                Dim _wert1 As String = "11.11.2011"
                Dim _type1 As System.Type = ListOfTypes.Item(1)
                If DateTime.TryParse(_wert1, New Date) Then
                End If
    

    but still not sure if it will work

  2. All data that have been read should be encoded in 1252 Codepage. this one doesn't really work

     Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited;CODEPAGE=1252""" 

anyone idea what could be done?


Solution

  • CSVHelper can do most of what it sounds like you want. FileHelpers might also work, I just haven't worked with it as much. I wouldn't recommend VB's TextFieldParser because it returns a string array rather than typed data.

    OleDB provides a great way to import, but saving to the database presents a small challenge. All the loaded rows will have a RowState of Unchanged. The only way to change that to Added is to copy the rows into a new table:

    For Each dr As DataRow In dtCSV.Rows
        dtDest.Rows.Add(dr.ItemArray)
    Next
    

    It will work, but as a result, you will have 2 tables and all those rows loaded at once. It turns out that using CSVHelper and a simple INSERT query is the most economical - since the records are fetched from an IEnumerable<T> only 1 source record will be loaded at a time. It is also a bit faster than copying rows: about 20% faster on 500k rows.

    Note: We have no idea whatsoever what the data looks like other than it is semicolon delimited...and there is apparently a lot of it.

    Using sr As New StreamReader(CSVFilePath, False),
         csv As New CsvReader(sr)
    
        ' some CSVHelper config options
        csv.Configuration.HasHeaderRecord = True
        csv.Configuration.TrimFields = True
        csv.Configuration.TrimHeaders = True
        csv.Configuration.Delimiter = ";"
        csv.Configuration.IsHeaderCaseSensitive = False
        csv.Configuration.RegisterClassMap(Of RandItem.RandItemMap)()
    
        ' get the file into IEnumerable collection
        Dim csvData = csv.GetRecords(Of OleImportItem)()
    
        Dim SQL = <sql>
                  INSERT INTO RandomData 
                         (Foo, Bar, Cat, Dog...)
                  VALUES 
                        (@p1, @p2, @p3, @p4...)
                </sql>.Value
    
        Using dbcon As New OleDbConnection(ACEConnStr)
            Using cmd As New OleDbCommand(SQL, dbcon)
                dbcon.Open()
    
                ' create the parameters
                cmd.Parameters.Add("@p1", OleDbType.VarChar)
                cmd.Parameters.Add("@p2", OleDbType.VarChar)
                cmd.Parameters.Add("@p3", OleDbType.Integer)
                cmd.Parameters.Add("@p4", OleDbType.Integer)
                ...
                ' load one item at a time, to save it
                For Each item In csvData
                    cmd.Parameters("@p1").Value = item.Foo
                    cmd.Parameters("@p2").Value = item.Bar
                    cmd.Parameters("@p3").Value = item.Cat
                    cmd.Parameters("@p4").Value = item.Dog
                    ...
                    cmd.ExecuteNonQuery()
                Next
            End Using
        End Using
    End Using
    

    I cant give a complete tutorial on how to use it, but in general you create a type (Class) which defines the datatype for each column (here,RandItem), the RandItemMap is another class which specifies the order of those properties in the file. In so doing, CSVHelper knows the datatype of each column and will convert for you.

    There are several ways to use it, this way is reading one line at a time from the file and immediately saves that item to the database: csvData = csv.GetRecords(Of OleImportItem)() initializes csvData as an IEnumerable(Of RandItem), so only one item is loaded at a time in the loop which makes it very economical.

    In the loop, the code gets a typed item from the csv file which is then mapped to the respective parameters and saved. This is about 20% faster than copying the data to a DataTable in batches and saving; and much less memory intensive since there is only ever 1 item loaded.


    For smaller csv files, you could use .ToArray() or ToList() to load the file to a collection.

    Note Importing/parsing a text file without headers is a little different than shown, but just as easy.

    It sounds like CSVHelper might obviate most issues in your laundry list.