mysqlvb.netbulkinsertbulkloader

MySqlBulkLoader from DataTable - VB.net


I would like to import about 9.000.000 rows from an xml file to a MySql server. Currently I'm inserting the data rows by row, which is very slow. I'm able to upload about 50 rows / sec, which means it's gonna take days to complete. For another project I loaded similar data into a data table with 5000 rows at a time, and then i would bulk insert all 5000 rows at once. This made me reach about 7.500 rows / sec. Problem is that was with an SQL server, and this is MySQL. I can't find any one using the MySqlBulkLoader class from a data table. Is this possible and how would i go about doing it?

Example of first 750 elemets in the xml file: http://view.qrdetector.dk/test.xml

These are the columns i need in my database from the xml file.

    'Create datatable to hold the information from the XML file
    Dim ReadXML_DT As New DataTable
    ReadXML_DT.Columns.Add("KoeretoejIdent", GetType(String))
    ReadXML_DT.Columns.Add("KoeretoejArtNavn", GetType(String))
    ReadXML_DT.Columns.Add("KoeretoejAnvendelseNavn", GetType(String))
    ReadXML_DT.Columns.Add("RegistreringNummerNummer", GetType(String))
    ReadXML_DT.Columns.Add("KoeretoejOplysningStatus", GetType(String))
    ReadXML_DT.Columns.Add("KoeretoejOplysningFoersteRegistreringDato", GetType(String))
    ReadXML_DT.Columns.Add("KoeretoejOplysningStelNummer", GetType(String))
    ReadXML_DT.Columns.Add("KoeretoejMaerkeTypeNavn", GetType(String))
    ReadXML_DT.Columns.Add("KoeretoejModelTypeNavn", GetType(String))
    ReadXML_DT.Columns.Add("KoeretoejVariantTypeNavn", GetType(String))
    ReadXML_DT.Columns.Add("DrivkraftTypeNavn", GetType(String))
    ReadXML_DT.Columns.Add("SynResultatSynsType", GetType(String))
    ReadXML_DT.Columns.Add("SynResultatSynsDato", GetType(String))
    ReadXML_DT.Columns.Add("SynResultatSynStatusDato", GetType(String))
    ReadXML_DT.Columns.Add("SidsteSynTjek", GetType(String))

I've manually made a the CSV file with 130.000 rows with the 15 colums I need. Then I used the bulk insert code from Plutonix's reply. I'm now able to parse the 130.000 rows in about 215 sec, which gives me an average speed of about 600 rows / sec. This is pretty much the same result as before. Is this because of my connection to the MySQL server?

Dim sw As Stopwatch = New Stopwatch
sw.Start()

' Finally, BulkLoad
Dim cols As String() = {"KoeretoejIdent", "KoeretoejArtNavn", "KoeretoejAnvendelseNavn", "RegistreringNummerNummer", "KoeretoejOplysningStatus", "KoeretoejOplysningFoersteRegistreringDato", "KoeretoejOplysningStelNummer", "KoeretoejMaerkeTypeNavn", "KoeretoejModelTypeNavn", "KoeretoejVariantTypeNavn", "DrivkraftTypeNavn", "SynResultatSynsType", "SynResultatSynsDato", "SynResultatSynStatusDato", "SidsteSynTjek"}
Dim rows As Integer = 0
Using dbcon As New MySqlConnection(connectionString)
    Dim bulk = New MySqlBulkLoader(dbcon)

    bulk.TableName = "synsbasen_testLoad"
    bulk.FieldTerminator = "^"
    bulk.LineTerminator = "\r\n"    ' == CR/LF
    bulk.FileName = "C:/Users/Synsbasen/Desktop/abc.csv"         ' full file path name to CSV 
    bulk.NumberOfLinesToSkip = 1    ' has a header (default)

    bulk.Columns.Clear()
    For Each s In cols
        bulk.Columns.Add(s)         ' specify col order in file
    Next

    rows = bulk.Load()
End Using
sw.Stop()
' SW is a stopwatch
MsgBox(rows & "rows converted and loaded in " & sw.Elapsed.TotalSeconds & " secs")

Solution

  • This will read in one million rows from XML, extract a subset of the data, export to CSV (using CSVHelper), then load them to MySql using MySqlBulkLoader in about 30 seconds.

    ' IEnumerable of the data parts to import
    Dim recList As IEnumerable(Of SmSample)
    
    ' load some columns as a class
    Using fs As FileStream = File.OpenRead(XMLFile)
        Dim xDoc = XDocument.Load(fs)
    
        ' its IEnumerable - leave it that way
        recList = xDoc.Descendants("Sample").
                   Select(Function(j) New SmSample With {.Name = j.Element("Name").Value,
                                     .Descr = j.Element("Descr").Value,
                                     .Price = Decimal.Parse(j.Element("Price").Value),
                                     .ItemDate = DateTime.Parse(j.Element("ItemDate").Value)
                                                        }
                          )
    End Using
    
    ' Have CSVHelper write them out
    ' this is the most time consuming part what with Disk IO and all
    Using strW As New StreamWriter(CSVFile)
        Using csv As New CsvWriter(strW)
    
            ' ToDo: add other things like Field separators etc
            csv.Configuration.RegisterClassMap(Of SmSample.CSVItemMap)()
    
            csv.WriteRecords(recList)
        End Using
    End Using
    
    ' Finally, BulkLoad
    Dim cols As String() = {"Name", "Descr", "Price", "ItemDate"}
    Dim rows As Int32 = 0
    Using dbcon As New MySqlConnection(MySQLConnStr)
        Dim bulk = New MySqlBulkLoader(dbcon)
    
        bulk.TableName = "Sample"
        bulk.FieldTerminator = ","
        bulk.LineTerminator = "\r\n"    ' == CR/LF
        bulk.FileName = CSVFile         ' full file path name to CSV 
        bulk.NumberOfLinesToSkip = 1    ' has a header (default)
    
        bulk.Columns.Clear()
        For Each s In cols
            bulk.Columns.Add(s)         ' specify col order in file
        Next
    
        rows = bulk.Load()
    End Using
    
    ' SW is a stopwatch
    Console.WriteLine("{0} rows converted and loaded in {1} secs", 
                              rows, sw.ElapsedMilliseconds / 1000)
    

    Yours will take longer naturally, but 1,000,000 rows is already kind of large, so it should scale decently enough. If you wanted to load them in batches use something like 500k. It would be easy to write the CSV in parts using Skip() and Take().

    The interim steps took 9 secs to select the parts from the XML, 15 secs to write the CSV, 7 secs for MySQL to load the data.

    The specifics of your XML file are a mystery (the link was added to the question well after this was posted). Since you can load it to a DataTable, the test just used the result of myDT.WriteXml(...), so you may have to change that part. Leave it as a linq query and let CSVHelper consume it since the applet itself doesn't need the collection data at all.

    The only "trick" was to format the DateTime field so MySQL could parse it:

    Public Class CSVItemMap
        Inherits CsvClassMap(Of SmSample)
    
        Public Sub New()
            AutoMap()
            Map(Function(m) m.ItemDate).TypeConverterOption("yyyy-MM-dd")
        End Sub
    
    End Class
    

    CSVHelper is very cool, very powerful and integral to the results.

    SmSample is just a class with the properties shown which is the desired columns from the larger XML data. Its roles are to a) shed the extraneous columns you don't need/want as soon as possible and b) provide the 'holder' for the data for CSVHelper to act upon.

    The answer also uses the built in MySqlBulkLoader tool which I find easier to use than the SQL LOAD DATA LOCAL INFILE form.