I'm writing some code to reformat some datasets. At the end, I have some empty columns in my output dataset, which I need to populate by looking up values from another sheet (Index-Match effectively).
I have tried many approaches, and two were successful. The problem is, they are far too slow, and unfortunately I am limited to using VBA. To prefix, I'm aware this problem could be solved by simply writing the Index-Match formula into the cells manually at the end and dragging them down, but I am trying to make life as simple as possible for the people who will need to perform this task monthly (also to reduce room for error).
Background info:
ws_ps example Column B as lookup value, with Column E to populate if a row in Column B is not empty
ws_priips example Column G corresponds to the lookup value in ws_ps Column B, with the value to look for coming from Column E
Approach 1: Looping through each row and using WorksheetFunction - 12 min runtime
On Error Resume Next
last_row_ps = ws_ps.UsedRange.Rows.Count
For ps_row = 2 To last_row_ps
ws_ps.Cells(ps_row, 5).Value = WorksheetFunction.IfError(WorksheetFunction.Index(ws_priips.Range("A:G"), _
WorksheetFunction.Match(ws_ps.Cells(ps_row, 2), ws_priips.Range("G:G"), 0), 5), "")
Next ps_row
On Error GoTo -1
Approach 2: Loading worksheets into arrays and writing to worksheet if conditions are met - 15min runtime
last_row_ps = ws_ps.UsedRange.Rows.Count
last_row_priips = ws_priips.UsedRange.Rows.Count
ps_array = ws_ps.Range("A1:X" & last_row_ps).Value
priips_array = ws_priips.Range("A1:AZ" & last_row_priips).Value
For ps_row = 2 To UBound(ps_array, 1)
For priips_row = 2 To UBound(priips_array, 1)
If ps_array(ps_row, 2) <> "" Then
If ps_array(ps_row, 2) = priips_array(priips_row, 7) Then
ws_ps.Cells(ps_row, 5).Value = priips_array(priips_row, 5)
GoTo SkipLoop
End If
Else
GoTo SkipLoop
End If
Next priips_row
SkipLoop:
Next ps_row
Am I out of luck? So far I've only implemented these solutions for 1 column, but I will need to apply it to around 10. Is there any way to drastically speed things up, without resorting to typing the functions into the worksheet or using python? I would be grateful for any pointers. I am still very much a beginner.
Read the data from the lookup sheet into a 2D array, then build a dictionary object with the ISIN value as the key, and the array "row" number as the value. Use the dictionary to extract matched row values from the array to populate to the sheet being filled. Faster if you first pull the "to be filled" and ISIN columns from the output dataset sheet, fill them using the dictionary and the lookup array, then at the end write them back to the output sheet. Basically try to avoid any cell-by-cell operations.
Using the code below, with both tables containing 150k rows (second table sorted randomly), I filled two columns in the first table in 3-4 sec.
Note if you need cross-platform support you can use a Collection in place of the dictionary, which is not available on Mac.
Sub Tester()
Dim arrDest, arrSrc, ws As Worksheet, isin As String
Dim dict As Object, r As Long, rMatch As Long
Dim rngDest As Range
'both my tables are on one sheet for testing
Set ws = ThisWorkbook.Worksheets("Sheet1")
'table to be filled
Set rngDest = ws.Range("A1").CurrentRegion
arrDest = rngDest.Value 'read to 2D array
'data used as lookup table, as an array
arrSrc = ws.Range("I1").CurrentRegion.Value
Set dict = CreateObject("scripting.dictionary")
'map isin to row number in lookup table array
For r = 2 To UBound(arrSrc, 1)
isin = arrSrc(r, 1)
If Len(isin) > 0 Then dict(isin) = r 'assumes no duplicates in lookup table
Next r
'loop the report table and try to match values in the lookup table
For r = 2 To UBound(arrDest, 1)
isin = arrDest(r, 1)
If Len(isin) > 0 Then
If dict.Exists(isin) Then 'have a match?
rMatch = dict(isin) 'matched row
arrDest(r, 3) = arrSrc(rMatch, 2) 'copy a couple of values
arrDest(r, 5) = arrSrc(rMatch, 3)
End If 'have match
End If 'not zero-length
Next r
rngDest.Value = arrDest 'replace data with updated array
End Sub
My test tables (shorter version):
Added: for anyone who needs cross-platform support (Windows/Mac) here's the basic "row map" approach using a Collection in place of a Dictionary. Loading the collection with 150k items takes about 10% longer than loading a dictionary of the same size, but retrieval by key is actually about 5x faster using a collection. Also seems like the Collection performance scales better for larger numbers of keys - for 400k items load and read is approx. 2.2 and 0.7 sec but for dictionary it's approx. 10.5 and 10 sec (much slower)
'Using a Collection like a dictionary for mapping column values
' to their position in a dataset
Sub CollectionTest()
Dim arr, r As Long, col As Collection, t, k As String, v
Set col = New Collection
'Source range A1:A150000 filled with "Val_000001", "Val_000002", etc
' sorted randomly
arr = Range("A1").CurrentRegion.Value
t = Timer
For r = 1 To UBound(arr, 1)
k = arr(r, 1)
'Add the row number as value and cell content as key
' note your row keys should be unique
If IsEmpty(KeyValue(col, k)) Then col.Add r, k
Next r
Debug.Print col.Count & " items"
Debug.Print "Loaded row map in " & Timer - t
t = Timer
For r = 1 To 150000
k = "Val_" & Format(r, "000000")
v = KeyValue(col, k)
If r < 5 Then Debug.Print "Key " & k & " at row# " & v
Next r
Debug.Print "Retrieved values in " & Timer - t
End Sub
'Retrieve value for key `k` from collection `col`
' Returns Empty if there's no such key
Function KeyValue(col As Collection, k As String)
On Error Resume Next 'ignore error if no match for `k`
KeyValue = col.Item(k)
End Function