I try to write an Excel VBA script which fetches some information (version and revision date) out of a binary FrameMaker file (*.fm).
Following sub opens the *.fm file and writes the first 25 lines (the informationen needed is into this first 25 lines) into a variable.
Sub fetchDate()
Dim fso As Object
Dim fmFile As Object
Dim fileString As String
Dim fileName As String
Dim matchPattern As String
Dim result As String
Dim i As Integer
Dim bufferString As String
Set fso = CreateObject("Scripting.FileSystemObject")
fileName = "C:\FrameMaker-file.fm"
Set fmFile = fso.OpenTextFile(fileName, ForReading, False, TristateFalse)
matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"
fileString = ""
i = 1
Do While i <= 25
bufferString = fmFile.ReadLine
fileString = fileString & bufferString & vbNewLine
i = i + 1
Loop
fmFile.Close
'fileString = Replace(fileString, matchPattern, "")
result = regExSearch(fileString, matchPattern)
MsgBox result
Set fso = Nothing
Set fmFile = Nothing
End Sub
The regex functions looks like this:
Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
Dim regEx As New RegExp
Dim strReplace As String
Dim result As String
Dim match As Variant
Dim matches As Variant
Dim subMatch As Variant
Set regEx = CreateObject("VBScript.RegExp")
If strPattern <> "" Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strPattern)
For Each match In matches
If match.SubMatches.Count > 0 Then
For Each subMatch In match.SubMatches
Debug.Print "match:" & subMatch
Next subMatch
End If
Next match
regExSearch = result
Else
regExSearch = "no match"
End If
End If
Set regEx = Nothing
End Function
Problem 1:
The content of the binary *.fm file which is saved in the variable "fileString" differs in every run, although the *.fm file stays the same.
Here are a few examples of the first three lines from different runs which are saved in "fileString":
example 1
<MakerFile 12.0>
Aaÿ No.009.xxx ???? /tEXt ??????
example 2
<MakerFile 12.0>
Aaÿ ` ? ???? /tEXt ? c ? E ? ????a A ? ? ? ? ? d??????? ? Heading ????????????A???????A
As you can see example 1 differs from example 2 although it was the very same VBA code and the very same *.fm file.
Problem 2:
It is also a big problem that the regex search string from "matchPattern" is written randomly into my "fileString". Here is a screenshot from the debug console:
How can this be? Any suggestions or ideas to fix this problem?
I'm using:
MS Office Professional Plus 2010
VBA reference for regex: Microsoft VBScript Regular Expressions 5.5
Thank you very much in advance!
Regards, Andy
/edit March 12th, 2018:
Here is a sample *.fm file: sample file If you open it with notepad, you can see some information like "Version - DateVersion 4 – 2018/Feb/07" and "Rev02 - 2018/Feb/21" in plain text. I want to fetch this information with a regular expression.
I found a solution with using ADODB.streams. This works fine:
Sub test_binary()
Dim regEx As Object
Dim buffer As String
Dim filename As String
Dim matchPattern As String
Dim result As String
Set regEx = CreateObject("VBScript.RegExp")
filename = "C:\test.fm"
With CreateObject("ADODB.Stream")
.Open
.Type = 2
.Charset = "utf-8"
.LoadFromFile filename
buffer = .Readtext(10000)
.Close
End With
matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"
result = regExSearch(buffer, matchPattern)
MsgBox result
End Sub
regex function:
Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
Dim regEx As New RegExp
Dim result As String
Dim match As Variant
Dim matches As Variant
Dim subMatch As Variant
Set regEx = CreateObject("VBScript.RegExp")
If strPattern <> "" Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strInput)
result = ""
For Each match In matches
If match.SubMatches.Count > 0 Then
For Each subMatch In match.SubMatches
If Len(result) > 0 Then
result = result & "||"
End If
result = result & subMatch
Next subMatch
End If
Next match
regExSearch = result
Else
regExSearch = "err_nomatch"
End If
End If
Set regEx = Nothing
End Function
It is important to open the *.fm file as text file (.Type = 2) and set the charset to "utf-8". Otherwise I wont have plain text for my regular expression to read.
Thank you very much for bringing me on the right way!