I have one orders table with 5000 orders. And I have one customers table with 5000 customers.
Some orders have null at values "CustomerID" field which I want to fix with an algorithm.
Currently, for each null value at "CustomersID" in the Orders table the algorithm considers three fields from the orders table ( 1. name, 2. country, 3. cluster) and loop through the customers table to find a match with identical fields in the customers table (1. name, 2. country, 3. cluster). If there is a match I add the "CustomerID" in the orders table (replacing null value at "CustomersID", otherwise I first create a new customer record in the Customers table and then add "CustomerID" in the Orders table.
The problem is slow algorithm. When looping though all fields that have null values in orders table (worst case 5000) and for each field try to find a match in the Customers table (worst case 5000 if there is no match) then the total time complexity becomes quadratic. N^2.
Thanks in advance! Attached is my code. The first method opens record set of orders table (rss). It then loops through orders recordset to find null values. The second method opens record set for the Customers table (rst). It then loops through to find a match in fields shared from the orders table (1. name 2. country 3. cluster).
Public Sub TransferNullValues() 'hard coded parameters are "Orders", "Customers"
Set coll = New Collection
coll.Add "CustomerID"
coll.Add "End_customer"
coll.Add "End_customer_country"
coll.Add "Cluster"
Dim sqlStr As String
sqlStr = "SELECT " & buildSql(coll) & " FROM Orders where Orders.CustomerID is null;"
coll.Remove (1) 'remove customer id
Dim rss As DAO.Recordset
Set rss = Basic.getRs(sqlStr)
Do While Not rss.EOF
Dim locals As Collection
Set locals = New Collection
Dim v As Variant
For Each v In coll
locals.Add (rss(v).value)
Next
Dim id As Integer
id = getCustID(locals)
rss.Edit
rss("CustomerID").value = id
rss.Update
rss.MoveNext
Loop
rss.Close
Set rss = Nothing
End Sub
Private Function getCustID(locals As Collection) As Integer
Dim sqlStr As String
sqlStr = "SELECT * FROM CUSTOMERS;"
Dim rst As DAO.Recordset
Set rst = Basic.getRs(sqlStr)
Dim trgt As Collection
Do While Not rst.EOF
Set trgt = New Collection
Dim v As Variant
For Each v In coll
trgt.Add (rst(v).value)
Next
Dim allExists As Boolean
allExists = True
For Each v In locals
If Not Basic.Exists(trgt, v) Then
allExists = False
Exit For
End If
Next
If allExists Then
getCustID = rst("CustomerID").value
'Debug.Print "customer exists in customers table"
Exit Function
End If
rst.MoveNext
Loop
rst.AddNew
Dim count As Integer
count = 1
For Each v In locals
rst.Fields(count).value = v
count = count + 1
Next
Dim id As Integer
id = rst("CustomerID")
rst.Update
rst.Close
Set rst = Nothing
getCustID = id
End Function
I thought about how to decrease the time complexity but could not find a solution.
You can do this more efficiently with an UPDATE query:
UPDATE Orders
INNER JOIN Customers ON
(Nz(Orders.name) = Nz(Customers.name)) AND
(Nz(Orders.country) = Nz(Customers.country)) AND
(Nz(Orders.cluster) = Nz(Customers.cluster))
SET Orders.CustomerId = Customers.CustomerId
WHERE Orders.CustomerId Is Null
The JOIN will be more efficient than O(n2). My guess is O(n·log(n)).
As commenters have pointed out, you must use the Nz()
function to convert possible Null
values into non-null values (an empty string or the number 0, etc.) for the comparisons to work properly.