regexexcelvbaframemaker

Regex search in binary file


I try to write an Excel VBA script which fetches some information (version and revision date) out of a binary FrameMaker file (*.fm).

Following sub opens the *.fm file and writes the first 25 lines (the informationen needed is into this first 25 lines) into a variable.

Sub fetchDate()
    Dim fso As Object
    Dim fmFile As Object

    Dim fileString As String
    Dim fileName As String
    Dim matchPattern As String
    Dim result As String
    Dim i As Integer
    Dim bufferString As String

    Set fso = CreateObject("Scripting.FileSystemObject")

    fileName = "C:\FrameMaker-file.fm"

    Set fmFile = fso.OpenTextFile(fileName, ForReading, False, TristateFalse)
    matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"

    fileString = ""
    i = 1
    Do While i <= 25
        bufferString = fmFile.ReadLine
        fileString = fileString & bufferString & vbNewLine
        i = i + 1
    Loop
    fmFile.Close

    'fileString = Replace(fileString, matchPattern, "")
    result = regExSearch(fileString, matchPattern)

    MsgBox result

    Set fso = Nothing
    Set fmFile = Nothing
End Sub

The regex functions looks like this:

Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
    Dim regEx As New RegExp

    Dim strReplace As String
    Dim result As String
    Dim match As Variant
    Dim matches As Variant
    Dim subMatch As Variant

    Set regEx = CreateObject("VBScript.RegExp")

    If strPattern <> "" Then
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.test(strInput) Then
            Set matches = regEx.Execute(strPattern)

            For Each match In matches
                If match.SubMatches.Count > 0 Then
                    For Each subMatch In match.SubMatches
                        Debug.Print "match:" & subMatch
                    Next subMatch
                End If
            Next match

            regExSearch = result
        Else
            regExSearch = "no match"
        End If
    End If

    Set regEx = Nothing
End Function

Problem 1:

The content of the binary *.fm file which is saved in the variable "fileString" differs in every run, although the *.fm file stays the same.

Here are a few examples of the first three lines from different runs which are saved in "fileString":

example 1

<MakerFile 12.0>


Aaÿ No.009.xxx  ????          /tEXt     ??????

example 2

<MakerFile 12.0>


Aaÿ  `      ? ????          /tEXt ?     c ? E     ? ????a A ? ?      ? ? ? d??????? ?        Heading ????????????A???????A

As you can see example 1 differs from example 2 although it was the very same VBA code and the very same *.fm file.

Problem 2:

It is also a big problem that the regex search string from "matchPattern" is written randomly into my "fileString". Here is a screenshot from the debug console:

parts of value of matchPattern

How can this be? Any suggestions or ideas to fix this problem?

I'm using:

MS Office Professional Plus 2010

VBA reference for regex: Microsoft VBScript Regular Expressions 5.5

Thank you very much in advance!

Regards, Andy

/edit March 12th, 2018:

Here is a sample *.fm file: sample file If you open it with notepad, you can see some information like "Version - DateVersion 4 – 2018/Feb/07" and "Rev02 - 2018/Feb/21" in plain text. I want to fetch this information with a regular expression.


Solution

  • I found a solution with using ADODB.streams. This works fine:

    Sub test_binary()
        Dim regEx As Object
    
        Dim buffer As String
        Dim filename As String
        Dim matchPattern As String
        Dim result As String
    
        Set regEx = CreateObject("VBScript.RegExp")
    
        filename = "C:\test.fm"
    
        With CreateObject("ADODB.Stream")
            .Open
            .Type = 2
            .Charset = "utf-8"
            .LoadFromFile filename
            buffer = .Readtext(10000)
            .Close
        End With
    
        matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"
    
        result = regExSearch(buffer, matchPattern)
    
        MsgBox result
    End Sub
    

    regex function:

    Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
        Dim regEx As New RegExp
    
        Dim result As String
        Dim match As Variant
        Dim matches As Variant
        Dim subMatch As Variant
    
        Set regEx = CreateObject("VBScript.RegExp")
    
        If strPattern <> "" Then
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
    
            If regEx.test(strInput) Then
                Set matches = regEx.Execute(strInput)
    
                result = ""
                For Each match In matches
                    If match.SubMatches.Count > 0 Then
                        For Each subMatch In match.SubMatches
                            If Len(result) > 0 Then
                                result = result & "||"
                            End If
                            result = result & subMatch
                        Next subMatch
                    End If
                Next match
    
                regExSearch = result
            Else
                regExSearch = "err_nomatch"
            End If
        End If
    
        Set regEx = Nothing
    End Function
    

    It is important to open the *.fm file as text file (.Type = 2) and set the charset to "utf-8". Otherwise I wont have plain text for my regular expression to read.

    Thank you very much for bringing me on the right way!