datagridviewduplicatesvb.net-2010virtualmode

How do you QUICKLY remove duplicates from large datagridview?


I need to be able to remove duplicate entries in a datagridview quickly. Unfortunetly the way I am doing it can take a few minutes with anything above 100K items.

Here is the code I am using:

  Dim wordlist As New List(Of String)
    Dim numCols As Integer = DataGridView1.ColumnCount
    Dim numRows As Integer = DataGridView1.RowCount - 1
    Dim wordlist2 As New List(Of String)

    For count As Integer = 0 To numRows - 1
        wordlist.Add(DataGridView1.Rows(count).Cells("url").Value)
    Next

    For Each word As String In wordlist
        If Not wordlist2.Contains(word) Then
            wordlist2.Add(word)
        End If
    Next

    fullitem.Clear()

    For Each word2 As String In wordlist2
        fullitem.Add(New item(word2, "", ""))

    Next

    DataGridView1.RowCount = fullitem.Count + 1
    MessageBox.Show("Done!")

The datagridview is in virtual mode to support massive amounts of data.

If anyone could help me figure out a fast way to remove dupes I would really appreciate it.


Solution

  • Instead of first adding it to wordList and then looping through that and checking when adding it to a second list, just check when you add it to the first list. Also, we add it to fullitem (no idea what that is, you don't show what it is) right away. We just use the list for the contains.

    This way, we reduce three loops to one.

    Dim wordlist As New List(Of String)
    Dim numCols As Integer = DataGridView1.ColumnCount
    Dim numRows As Integer = DataGridView1.RowCount - 1
    Dim word As String
    
    fullitem.Clear()
    
    For count As Integer = 0 To numRows - 1
        word = DataGridView1.Rows(count).Cells("url").Value
        If Not wordlist.Contains(word) Then
            wordlist.Add(word)
            fullitem.Add(New item(word, "", ""))
        End If
    Next
    
    DataGridView1.RowCount = fullitem.Count + 1
    MessageBox.Show("Done!")