htmlpowershellms-wordrtf

PowerShell script for converting HTML to RTF using Microsoft Word


I needed a PowerShell script for converting a HTML document to RTF using Microsoft Word. No publicly available snippets worked for me.


Solution

  • This works in 2023 with Microsoft Word 365 and PowerShell 7.4.0 on Windows 10 22H2.

    For adapting this script to other formats:

    The script:

    $ScriptLocation = Split-Path -Parent $Script:MyInvocation.MyCommand.Path
    
    $InputFile="$ScriptLocation\source.htm"
    $OutputFile="$ScriptLocation\target.rtf"
    
    # First load MS Office base assemblies into the environment
    Add-Type -Path $env:WINDIR\assembly\GAC_MSIL\office\*\office.dll #-PassThru  # Use -PassThru to verify it worked (otherwise, fails silently)
    # Then, add the target MS Word assembly
    Get-ChildItem -Path $env:windir\assembly -Recurse -Filter Microsoft.Office.Interop.Word* -File | ForEach-Object {
        Add-Type -LiteralPath ($_.FullName) #-PassThru # Use -PassThru to verify it worked (otherwise, fails silently)
    }
    
    #  Doc open parameter array
    $DOCOpen = @{}
    $DOCOpen.FileName=$InputFile
    $DOCOpen.ConfirmConversions=[Microsoft.Office.Core.MsoTriState]::msoFalse
    $DOCOpen.ReadOnly=[Microsoft.Office.Core.MsoTriState]::msoTrue
    $DOCOpen.AddToRecentFiles=[Microsoft.Office.Core.MsoTriState]::msoFalse
    $DOCOpen.PasswordDocument=""
    $DOCOpen.PasswordTemplate=""
    $DOCOpen.Revert=$true
    $DOCOpen.WritePasswordDocument=""
    $DOCOpen.WritePasswordTemplate=""
    # Or use ::wdOpenFormatAuto
    $DOCOpen.Format=[Microsoft.Office.Interop.Word.WdOpenFormat]::wdOpenFormatWebPages
    $DOCOpen.Encoding=[Microsoft.Office.Core.MsoEncoding]::msoEncodingUTF8
    $DOCOpen.Visible=$false
    $DOCOpen.OpenAndRepair=$false
    $DOCOpen.DocumentDirection=[Microsoft.Office.Interop.Word.WdDocumentDirection]::wdLeftToRight
    $DOCOpen.NoEncodingDialog=$true
    $DOCOpen.XMLTransform=""
    
    # Create MS Office object
    $appWord = New-Object -ComObject Word.Application
    
    # Set application objecs not visible
    $appWord.visible = $False
    
    # Supress document macros
    $appWord.AutomationSecurity = [Microsoft.Office.Core.MsoAutomationSecurity]::msoAutomationSecurityForceDisable
    
    # Supress alerts or dialogs
    $appWord.DisplayAlerts = [Microsoft.Office.Interop.Word.WdAlertLevel]::wdAlertsNone
    
    # Word specific settings
    $appWord.ScreenUpdating = $False
    $appWord.DisplayRecentFiles = $False
    $appWord.DisplayScrollBars = $False
    
    $SaveFormat =  [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatRTF
    
    $DOCDocument = $appWord.Documents.OpenNoRepairDialog(
        [ref]$DOCOpen.FileName,
        [ref]$DOCOpen.ConfirmConversions,
        [ref]$DOCOpen.ReadOnly,
        [ref]$DOCOpen.AddToRecentFiles,
        [ref]$DOCOpen.PasswordDocument,
        [ref]$DOCOpen.PasswordTemplate,
        [ref]$DOCOpen.Revert,
        [ref]$DOCOpen.WritePasswordDocument,
        [ref]$DOCOpen.WritePasswordTemplate,
        [ref]$DOCOpen.Format,
        [ref]$DOCOpen.Encoding,
        [ref]$DOCOpen.Visible,
        [ref]$DOCOpen.OpenAndRepair,
        [ref]$DOCOpen.DocumentDirection,
        [ref]$DOCOpen.NoEncodingDialog,
        [ref]$DOCOpen.XMLTransform
      )
    $DOCDocument.SaveAs("$OutputFile", [ref]$SaveFormat)
    $DOCDocument.Close()
    
    $appWord.AutomationSecurity = [Microsoft.Office.Core.MsoAutomationSecurity]::msoAutomationSecurityByUI
    $appWord.quit()
    
    $rc = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($appWord)
    

    This is mainly based on the following: