excelvba

Sorting data algorithm 900000 rows of data


I have four columns AA, AB, AC, and AD with values that I want to sort after a certain pattern:

Column AA Column AB Column AC Column AD Column AE
Row 1 123 444 6666
Row 2 A tz s4 23
Row 3 1111 56 hh 23
Row 4 D 56 F 4
Row 5 56 F A
Row 6 456 55 3333 23
Row 7 A 333 A56 55555
Row 8 1 555 VBC A
Row 9 A 5899 B6 23
Row 10 2 TZU 98 56
  1. All purely numerical values must be moved to the front before all other alphanumerical values.

  2. Empty fields must be put to the end.

  3. The numerical value with the largest number of characters must be placed in the first column. For example, the value "55555" in Row 7 and Column AD must be after execution of the macro in Row 7 and Column AA.

The final result of the example above should look like this:

Column AA Column AB Column AC Column AD Column AE
Row 1 6666 123 444
Row 2 23 A tz s4
Row 3 1111 56 23 hh
Row 4 56 4 D F
Row 5 56 F A
Row 6 3333 456 55 23
Row 7 55555 333 A A56
Row 8 555 1 VBC A
Row 9 5899 23 A B6
Row 10 98 56 2 TZU

My solution is below. The problem is that I couldn't realize point 2, moving the empty field to the end. Also, I'm using loops, and I want to transform 900000 (!) rows of data. Running the macro like this takes days...any other solution is appreciated. Thank you.

Option Explicit

Sub resort()

    Dim i As Long
    Dim j As Long
    Dim temp As Range
    
    With Worksheets("Tabelle1")
    
    For j = 1 To 10
    
        For i = 2 To 15
        
            If IsNumeric(.Range("AA" & i)) = False And IsNumeric(.Range("AB" & i)) = True Then
            
                .Range("AB" & i).Copy Destination:=.Range("AE" & i)
                .Range("AA" & i).Copy Destination:=.Range("AB" & i)
                .Range("AE" & i).Copy Destination:=.Range("AA" & i)
                .Range("AE" & i).Clear
                            
            End If
        
        Next i
        
        For i = 2 To 15
        
            If IsNumeric(.Range("AB" & i)) = False And IsNumeric(.Range("AC" & i)) = True Then
            
                .Range("AC" & i).Copy Destination:=.Range("AE" & i)
                .Range("AB" & i).Copy Destination:=.Range("AC" & i)
                .Range("AE" & i).Copy Destination:=.Range("AB" & i)
                .Range("AE" & i).Clear
                            
            End If
        
        Next i
        
        For i = 2 To 15
        
            If IsNumeric(.Range("AC" & i)) = False And IsNumeric(.Range("AD" & i)) = True Then
            
                .Range("AD" & i).Copy Destination:=.Range("AE" & i)
                .Range("AC" & i).Copy Destination:=.Range("AD" & i)
                .Range("AE" & i).Copy Destination:=.Range("AC" & i)
                .Range("AE" & i).Clear
                            
            End If
        
        Next i
        
     Next j
        
        
     '++++++++++++++++++++++++++++++++++++++++++++++++++++'
        
     For j = 1 To 10
        
        For i = 2 To 15
        
            If IsNumeric(.Range("AB" & i)) = True Then
            
                If Len(.Range("AB" & i)) > Len(.Range("AA" & i)) Then
            
                    .Range("AB" & i).Copy Destination:=.Range("AE" & i)
                    .Range("AA" & i).Copy Destination:=.Range("AB" & i)
                    .Range("AE" & i).Copy Destination:=.Range("AA" & i)
                    .Range("AE" & i).Clear
                    
                End If
                            
            End If
        
        Next i
    
        For i = 2 To 15
        
            If IsNumeric(.Range("AC" & i)) = True Then
            
                If Len(.Range("AC" & i)) > Len(.Range("AB" & i)) Then
                
                    .Range("AC" & i).Copy Destination:=.Range("AE" & i)
                    .Range("AB" & i).Copy Destination:=.Range("AC" & i)
                    .Range("AE" & i).Copy Destination:=.Range("AB" & i)
                    .Range("AE" & i).Clear
                                
                End If
                
            End If
        
        Next i
      
        For i = 2 To 15

            If IsNumeric(.Range("AD" & i)) = True Then

                If Len(.Range("AD" & i)) > Len(.Range("AC" & i)) Then

                    .Range("AD" & i).Copy Destination:=.Range("AE" & i)
                    .Range("AC" & i).Copy Destination:=.Range("AD" & i)
                    .Range("AE" & i).Copy Destination:=.Range("AC" & i)
                    .Range("AE" & i).Clear

                End If

            End If

        Next i
    
    Next j
        
    End With

End Sub

Solution

  • You have 3 tasks to solve:

    a) You have a huge amount of data, so you need a fast routine.
    This is easy: Read all your data into memory on one go (into a 2-dimensional array). Work on that array. When everything is sorted, write the data back into Excel in one go.

    Sub sortMydata()
        With Worksheets("Tabelle1")
            Dim rowcount As Long
            rowcount = .Range("AA1").CurrentRegion.Rows.Count
            ' Read Excel data into 2-dimensional array
            Dim data
            data = .Range("AA1").Resize(rowcount, 4)
            ' Sort all rows
            For row = 2 To rowcount
                sortrow data, row
            Next
            ' Write sorted data back into sheet    
            .Range("AA1").Resize(rowcount, 4) = data
        End With
    End Sub
    

    b) You need to sort your data (row by row). For this we need a sorting algorithm. As we always sort only very few values (4 per row), a simple bubble sort is the best option. There are tons of implementations that can be found on the internet. The only thing we need to know is that we want to sort values of one row while most algorithms assume you want to sort data by one (or several) columns.

    Sub sortrow(data, row As Long)
        Dim i As Long, j As Long
        ' A simple Bubble Sort to sort the values of one Row
        For i = LBound(data, 2) To UBound(data, 2) - 1
            For j = i To UBound(data, 2)
                If sortBefore(data(row, j), data(row, i)) Then
                    Dim tmp As Variant
                    tmp = data(row, i)
                    data(row, i) = data(row, j)
                    data(row, j) = tmp
                End If
            Next
        Next
    End Sub
    

    c) You need an algorithm that compares 2 values to decide which one will come "first". The sorting algorithm will use that to sort your data.

    Function sortBefore(v1 As Variant, v2 As Variant) As Boolean
        If v1 = "" Then
            sortBefore = False            ' Blanks to the end
        ElseIf IsNumeric(v1) Then
            If IsNumeric(v2) Then
                sortBefore = v1 > v2      ' Compare Numeric values: Larger first
            Else
                sortBefore = True         ' Number before string
            End If
        Else
            If IsNumeric(v2) Then
                sortBefore = False        ' String after numeric
            Else
                sortBefore = LCase(v1) < LCase(v2)      ' Compare string values: Smaller first
            End If
        End If
    End Function
    

    It took 1 or 2 seconds to run it with 100.000 rows of data.