.netvb.netdatatabledatasetdatarow

What is the optimal way to filter rows from a DataTable?


I'm looking for the easiest and fastest way to get DataRows from a DataTable. These DataRows have several columns including one with Integer and I only want the rows with the highest value.

Currently i get this result like this :

Dim maxValue = 0
For Each row In mDataTable.Rows
If row.valueCell > maxValue Then
    maxValue = row.valueCell
    End If
Next
Dim mDataTableBis = mDataTable.Clone
For Each row In mDataTable.Select("value = " & valueCell)
    mDataTableBis.ImportRow(row)
Next

Original DataTable (for example):

Rows letters value
row 1 (wanted) x 4
row 2 y 2
row 3 (wanted) z 4

Solution

  • If you prefer the for-each loop approach, like it seems from the code you posted, use this function:

    Public Sub ForEachLoop()
    
        Dim maxValue As Integer
    
        ' Loop through rows to find max value
        For Each row As DataRow In mDataTable.Rows
            Dim currentRowValue As Integer = row.Field(Of Integer)(numbersColumn)
            If currentRowValue > maxValue Then
                maxValue = currentRowValue
            End If
        Next
    
        ' Create a List of DataRow
        Dim res As New List(Of DataRow)
    
        ' Loop through rows again to add to list each row in which numbersColumn field = maxValue
        For Each row As DataRow In mDataTable.Rows
            If row.Field(Of Integer)(numbersColumn) = maxValue Then
                res.Add(row)
            End If
        Next
    
        ' Create results DataTable copying the List to a new DataTable
        Dim result As DataTable = res.CopyToDataTable()
    
    End Sub
    

    If you would like something more concise, you can try with LINQ but this is a more advanced solution, surely not beginner-friendly.

    Something like:

    Public Sub LINQ()
    
        ' Find the row with highest value in numbersColumn, then get the value from the field
        Dim maxValue As Integer = mDataTable.AsEnumerable().MaxBy(Function(x) x.Field(Of Integer)(numbersColumn)).Field(Of Integer)(numbersColumn)
    
        ' Select all the rows with numbersColumn value = maxValue (this returns an IEnumerable of DataRow)
        Dim resultsRows As IEnumerable(Of DataRow) = mDataTable.AsEnumerable().Where(Function(x) x.Field(Of Integer)(numbersColumn) = maxValue)
    
        ' Create results DataTable copying the IEnumerable to a new DataTable
        Dim result As DataTable = resultsRows.CopyToDataTable()
    
    End Sub
    

    You could also use the DataTable.Select() method...

    Public Sub DataTableSelect()
    
        Dim result As DataTable = mDataTable.Select("numbersColumn = max(numbersColumn)").CopyToDataTable()
    
    End Sub
    

    ...but this is significally slower than a for each loop:

    Benchmark

    As you can see, the for each version is around 3500 times faster than the Select() version - that's a big difference! This test has been executed on a relatively small DataTable with around 10k rows - imagine how big the difference could be in a million-rows-DataTable.