vbaalgorithmms-accesstime-complexityms-access-2013

In MS Access using VBA: I want to transfer records with less than N^2 time complexity


I have one orders table with 5000 orders. And I have one customers table with 5000 customers.

Some orders have null at values "CustomerID" field which I want to fix with an algorithm.

Currently, for each null value at "CustomersID" in the Orders table the algorithm considers three fields from the orders table ( 1. name, 2. country, 3. cluster) and loop through the customers table to find a match with identical fields in the customers table (1. name, 2. country, 3. cluster). If there is a match I add the "CustomerID" in the orders table (replacing null value at "CustomersID", otherwise I first create a new customer record in the Customers table and then add "CustomerID" in the Orders table.

The problem is slow algorithm. When looping though all fields that have null values in orders table (worst case 5000) and for each field try to find a match in the Customers table (worst case 5000 if there is no match) then the total time complexity becomes quadratic. N^2.

Thanks in advance! Attached is my code. The first method opens record set of orders table (rss). It then loops through orders recordset to find null values. The second method opens record set for the Customers table (rst). It then loops through to find a match in fields shared from the orders table (1. name 2. country 3. cluster).

Public Sub TransferNullValues() 'hard coded parameters are "Orders", "Customers"
    Set coll = New Collection
    coll.Add "CustomerID"
    coll.Add "End_customer"
    coll.Add "End_customer_country"
    coll.Add "Cluster"
    
    Dim sqlStr As String
    sqlStr = "SELECT " & buildSql(coll) & " FROM Orders where Orders.CustomerID is null;"
    coll.Remove (1) 'remove customer id
    
    Dim rss As DAO.Recordset
    Set rss = Basic.getRs(sqlStr)
    
    Do While Not rss.EOF
        Dim locals As Collection
        Set locals = New Collection
        
        Dim v As Variant
        For Each v In coll
            locals.Add (rss(v).value)
        Next
        
        Dim id As Integer
        id = getCustID(locals)
        
        rss.Edit
        rss("CustomerID").value = id
        rss.Update
            
        rss.MoveNext
    Loop
    
    rss.Close
    Set rss = Nothing
    
End Sub

Private Function getCustID(locals As Collection) As Integer
    Dim sqlStr As String
    sqlStr = "SELECT * FROM CUSTOMERS;"
    Dim rst As DAO.Recordset
    Set rst = Basic.getRs(sqlStr)
    Dim trgt As Collection
    
    
    Do While Not rst.EOF
        Set trgt = New Collection
        
        Dim v As Variant
        For Each v In coll
            trgt.Add (rst(v).value)
        Next
        
        Dim allExists As Boolean
        allExists = True
        
        For Each v In locals
            If Not Basic.Exists(trgt, v) Then
                allExists = False
                Exit For
            End If
        Next
        
        If allExists Then
            getCustID = rst("CustomerID").value
            'Debug.Print "customer exists in customers table"
            Exit Function
        End If
    
        rst.MoveNext
        
    Loop
    
    rst.AddNew
    Dim count As Integer
    count = 1
    For Each v In locals
        rst.Fields(count).value = v
        count = count + 1
    Next
    
    Dim id As Integer
    id = rst("CustomerID")
    
    rst.Update
        rst.Close
    Set rst = Nothing
    
    getCustID = id
End Function

I thought about how to decrease the time complexity but could not find a solution.


Solution

  • You can do this more efficiently with an UPDATE query:

    UPDATE Orders
        INNER JOIN Customers ON
            (Nz(Orders.name)    = Nz(Customers.name)) AND
            (Nz(Orders.country) = Nz(Customers.country)) AND
            (Nz(Orders.cluster) = Nz(Customers.cluster))
    SET Orders.CustomerId = Customers.CustomerId
    WHERE Orders.CustomerId Is Null
    

    The JOIN will be more efficient than O(n2). My guess is O(n·log(n)).

    As commenters have pointed out, you must use the Nz() function to convert possible Null values into non-null values (an empty string or the number 0, etc.) for the comparisons to work properly.