xmlwindowspowershellxml-parsingcommand-line-tool

Extract embedded XML data from an Audio File in windows


We have a platform that records our callcentre calls and at the end of the wav file adds some xml that holds important metadata about that call.

I'm trying to read a folder of these wav files and pull the meta data in to a list for a user, their preference is for the list to be in excel, however I'm struggling to find a method that would reliably work on a normal windows computer without having something special installed, like Python.

Like Excel has an xml import function but that fails as the xml is at the end of the files and excel reads from the start and gets confused by the audio part, just need to skip down to <recording> and read from there until <\recording>.

I've tried the following in powershell:

$directory = "C:\test"
$wavFiles = Get-ChildItem -Path $directory -Filter *.wav

foreach ($file in $wavFiles) {
    Write-Host "Processing file: $($file.Name)"
    $content = Get-Content -Path $file.FullName -Raw -Encoding Byte
    $decodedContent = [System.Text.Encoding]::UTF8.GetString($content)
    $match = [regex]::Match($decodedContent, '<recording>.+?</recording>')
    if ($match.Success) {
        $xmlContent = $match.Value
        Write-Host "Found XML in file $($file.Name):"
        Write-Host $xmlContent
    } else {
        Write-Host "No XML found in file $($file.Name)."
    }
}

And this correctly locates the file, but is unable to parse the xml. Which can be seen when opening the file in a text editor like notepad++

Processing file: 131346032527__8115_02-13-2024-11-12-58.wav

No XML found in file 131346032527__8115_02-13-2024-11-12-58.wav.

Any ideas?


Solution

  • Note:


    By default, . in the .NET regex engine matches any character except a newline character (\n).