libreoffice-basicopenoffice-basic

How to count duplicate entries in OpenOffice/LibreOffice BASIC?


I have a gnarly amount of data across many sheets in LibreOffice -- an ADDRESS column and a DATA column -- and I'd like to count the number of times each address occurs, put into a NUM_ADDR column. E.g.:

ADDR             | DATA             | NUM_ADDR
00000000bbfe22d0 | 876d4eb163886d4e | 1
00000000b9dfffd0 | 4661bada6d4661ba | 1
00000000b9dfc3d0 | 5d4b40b4705d4b40 | 1
00000000b9def7d0 | 8f8570a5808f8570 | 1
00000000b9de17d0 | 63876d4eb163886d | 1
00000000b9dddfd0 | 6d4eb163886d4eb1 | 3
00000000b9dddfd0 | 705d4b40b4705d4b | 
00000000b9dddfd0 | b4705d4b40b4705d | 
00000000b7df83d0 | 40b4705d4b40b470 | 1
00000000b7d607d0 | 705d4b40b4705d4b | 1
...

When doing things manually I used the COUNTIF function on each address, but I've found that a macro would save time in the long run. Here's a snippet of what I have so far, given that a previous function has already determined the length (number of rows) of the data, stored in RowCounter:

Dim CountedAddr(RowCounter, RowCounter) as String
Dim CountedAddrPtr as Integer
Dim CurrentCell as Object
Dim i as Integer

CountedAddrPtr = 0

' Populate CountedAddr array
For i = 1 to RowCounter-1
  CurrentCell = CurrentSheet.getCellByPosition(0, i)
  If Not CurrentCell.String In CountedAddr(?) Then
    CurrentSheet.getCellByPosition(2, i).Value = 1 ' for debugging
    CountedAddr(CountedAddrPtr, 0) = CurrentCell.String
    CountedAddrPtr = CountedAddrPtr + 1
  Else
    CurrentSheet.getCellByPosition(2, i).Value = 0 ' for debugging
  EndIf
Next

' For each unique address, count number of occurances
For i = 0 to UBound(CountedAddr())
  For j = 1 to RowCounter-1
    If CurrentSheet.getCellByPosition(0, j).String = CountedAddr(i, 0) Then
      CountedAddr(i, 1) = CountedAddr(i, 1)+1
    EndIf
  Next
Next

' Another function to populate NUM_ADDR from CountedAddr array...

So my first question is: how can we determine if an element (the address in the current cell) is in the CountedAddr array (see the (?) above)? Second, is there a much more efficient way to achieve the second block of code? Unfortunately sorting is out of the question, since the chronology of the addresses and data form something of a time base. Third, is the whole shebang a foolish way to attack this problem?

Many thanks from a hardware dood on a software task!


Solution

  • Dictionary-type objects such as a VB6 Collection are efficient for looking up items, because it finds the key directly rather than looping through a long array. Our countedAddrs collection below will store a count for each address.

    Sub CountAddrs
        Dim countedAddrs As New Collection
        Dim oCurrentSheet As Object
        Dim oCurrentCell As Object
        Dim currentAddr As String
        Dim i As Integer
        Dim newCount As Integer
        Dim rowCounter As Integer
        Const ADDR_COL = 0
        Const COUNT_COL = 2
    
        oCurrentSheet = ThisComponent.CurrentController.ActiveSheet
        rowCounter = 11
        ' Populate countedAddrs array.
        For i = 1 to rowCounter - 1
          oCurrentCell = oCurrentSheet.getCellByPosition(ADDR_COL, i)
          currentAddr = oCurrentCell.String
          If Contains(countedAddrs, currentAddr) Then
            ' Increment the count.
            newCount = countedAddrs.Item(currentAddr) + 1
            countedAddrs.Remove(currentAddr) 
            countedAddrs.Add(newCount, currentAddr)
            oCurrentSheet.getCellByPosition(COUNT_COL, i).Value = newCount ' for debugging
          Else
            countedAddrs.Add(1, currentAddr)
            oCurrentSheet.getCellByPosition(COUNT_COL, i).Value = 1 ' for debugging
          EndIf
        Next
    End Sub
    

    This code requires the following helper function. In most languages, dictionary objects have this functionality built-in, but Basic is rather simplistic.

    ' Returns True if the collection contains the key, otherwise False.
    Function Contains(coll As Collection, key As Variant)
        On Error Goto ErrorHandler
        coll.Item(key)
        Contains = True
        Exit Function
    ErrorHandler:
        Contains = False
    End Function