powershellpdf

Importing PDF into Word using Powershell


I'm running the below code to convert large PDF-files into Word and execute them via Powershell, but keep running into the below error. I've cast the confirmConversions/link/attachment parameters to a an object explicitly, but that doesn't help.

# Define paths
$folderPath = "my source folder"
$resultPath = "my destination folder"

# Create Word application COM object
$wordApp = New-Object -ComObject Word.Application
$wordApp.Visible = $false  # Set Word to run in the background

# Create FileSystemObject COM object
$fso = New-Object -ComObject Scripting.FileSystemObject

# Get the folder
$folder = $fso.GetFolder($folderPath)

# Loop through each file in the folder
foreach ($file in $folder.Files) {
    # Create a new Word document
    $newDoc = $wordApp.Documents.Add()

    # Explicitly cast the boolean and object parameters as [ref] objects for the InsertFile method
    $confirmConversions = [ref] $false
    $link = [ref] $false
    $attachment = [ref] $false

    # Insert the content of the current file into the new document
    $newDoc.Range().InsertFile($file.Path, [ref] $null, $confirmConversions, $link, $attachment)

    # Optionally, add some additional text or actions to the new document
    $newDoc.Content.InsertBefore("Processed document: " + $file.Name + "`n")

    # Define the document name and save it
    $docName = [System.IO.Path]::Combine($resultPath, ($file.Name -replace ".\w+$", "_processed.docx"))
    $newDoc.SaveAs([ref] $docName, [ref] 16)  # 16 = wdFormatXMLDocument for .docx

    # Close the document after saving
    $newDoc.Close([ref] $false)
}

# Cleanup
$wordApp.Quit()
$fso = $null
[System.GC]::Collect()  # Force garbage collection to release COM objects

Error:

Exception setting "InsertFile": Cannot convert the "False" value of type "bool" to type "Object".
At [my source folder]#ProcessWordDocs.ps1:26 char:5
+     $newDoc.Range().InsertFile($file.Path, [ref] $null, $confirmConve ...
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodException
    + FullyQualifiedErrorId : RuntimeException

Any ideas on how to solve this?


Solution

  • Here's a way of doing this without using the InsertFile() method.
    You can simply open the pdf with word to have it converted into a document, so all you need to do after that is to save the document to a different path with a new extension:

    # Define paths
    $folderPath = "my source folder"
    $resultPath = "my destination folder"  # make sure the path exists
    
    # Create Word application COM object
    $wordApp = New-Object -ComObject Word.Application
    $wordApp.Visible = $false  # Set Word to run in the background
    
    # Loop through each file in the folder
    foreach ($file in (Get-ChildItem -Path $folderPath -Filter '*.pdf' -File)) {
        # Create a new Word document
        $newDoc = $wordApp.Documents.Open($file.FullName)
    
        # Optionally, add some additional text or actions to the new document
        $newDoc.Content.InsertBefore("Processed document: " + $file.Name + "`r`n")
    
        # Combine the document new path and name and save it
        $docName = Join-Path -Path $resultPath -ChildPath ('{0}_processed.docx' -f $file.BaseName)
        $wordApp.ActiveDocument.SaveAs("$docName", 16)   # 16 = wdFormatXMLDocument for .docx
        # Close the document after saving
        $newDoc.Close()
    }
    
    # quit Word and cleanup the used COM objects
    $wordApp.Quit()
    
    $null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($newDoc)
    $null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($wordApp)
    [System.GC]::Collect()
    [System.GC]::WaitForPendingFinalizers()