sqlxmlpowershell

Losing time with displays and printed content when cleaning XML files


I'm trying to create a PowerShell routine to clean XML files automatically. I have successfully created my routine, and I'm able to clear a file with different functions and scripts. But I want to launch my PowerShell routine every time that I have new XML file. So I've decided to add a system to deal with every files in a directory.

Now that I'm calling my routine to clean my PowerShell scripts, even if I don't use Write-Host, It displays rows when I'm launching my routine, and I'm losing a lot of time to clear XML files.

Here is my code:

param ([string] $sourceDirectory, [string] $targetDirectory, [string] $XSDFileName, [string] $dataSourceName, [string] $databaseName)

clear

function clearLocalVariables{
    #This functions clears my local variables
}

function createSQLNodesList{
    param ([string] $dataSourceName,[string] $databaseName)
    #This function creates a list of available and allowed nodes in my XML Files from SQL databases.
}

The following functions are used to check my nodes, and this is where the prints and Write-Host appears when it's launched more than once:

function isNodeNameValid {
    param ([string] $testedNodeName)

    #    This function is used to return the value of the nodeAnalysis function.
    #    It selects which list the node will be analyzed depending on the fact that
    #    it is a node for the aspect of the XML or for data.
    #    - $testedNodeName is a string representing the XML node analyzed.

    # If the node name is a 5 length string, begins with an A, and is composed of
    # 4 digits ('AXXXX'), then it is data.
    if(($testedNodeName.Length -eq 5) -and ($testedNodeName.Substring(0,1) -eq "A" ) -and ($testedNodeName.Substring(1,4) -match "^[-]?[0-9.]+$")) {
        return nodeAnalysis -nodesList $nodesSQL -testedNodeName $testedNodeName
    #Else, it is in the list for the aspect of the XML.
    } else {
        return nodeAnalysis -nodesList $nodesXML -testedNodeName $testedNodeName
    }
}

function nodeAnalysis {
    param ($nodesList,[string] $testedNodeName)

    # This function is used to analyze each node name given.
    # It compares the name of the name analyzed to each node in the array given in parameter.
    # - $nodesList is the corresponding array depending on the isNodeNameValid() method.
    # - $testedNodeName is a string representing the XML node analyzed.
    # We compare each node of the node array to the testedNodeName. If the testedNodeName is in this array, the method returns 1.
    foreach($nodeName in $nodesList) {
        if ($testedNodeName -eq $nodeName) {
            return 1
        }
    }

    #If the node correspond to any node of the list, then the method returns 0.
    return 0
}

# -- XML Nodes recursive cleaning method -- #

function cleanXMLContent {
    param ($XMLDoc,[int] $endOfLeaf, [int] $boucle)
    #This is the function I have trouble with displays and efficiency :

    while($endOfFile -ne 1) {
        if($endOfLeaf -eq 1) {
            if($XMLDoc.Name -eq "#document"){
                $endOfFile = 1
            }
            if($XMLDoc.NextSibling) {
                $XMLDoc = $XMLDoc.NextSibling
                $endOfLeaf = 0
            } else {
                $XMLDoc =  $XMLDoc.ParentNode
                $endOfLeaf = 1
            }
        } else {
            if(!(isNodeNameValid -testedNodeName $XMLDoc.Name)) {
                if($XMLDoc.PreviousSibling) {
                    $nodeNameToDelete = $XMLDoc.Name
                    $siblingNodeName = $XMLDoc.PreviousSibling.Name
                    $XMLDoc = $XMLDoc.ParentNode
                    $XMLDoc.RemoveChild($XMLDoc.SelectSingleNode($nodeNameToDelete))
                    $XMLDoc = $XMLDoc.SelectSingleNode($siblingNodeName)
                } else {
                    $nodeNameToDelete = $XMLDoc.Name
                    $XMLDoc = $XMLDoc.ParentNode
                    $XMLDoc.RemoveChild($XMLDoc.SelectSingleNode($nodeNameToDelete))
                }
            } else {
                if($XMLDoc.HasChildNodes) {
                    $XMLDoc = $XMLDoc.FirstChild
                    $endOfLeaf = 0
                } else {
                    if($XMLDoc.NextSibling) {
                        $XMLDoc = $XMLDoc.NextSibling
                        $endOfLeaf = 0
                    } else {
                        if($XMLDoc.ParentNode) {
                            $XMLDoc = $XMLDoc.ParentNode

                            if($XMLDoc.NextSibling) {
                                $endOfLeaf = 1
                            } else {
                                $XMLDoc = $XMLDoc.ParentNode
                                $endOfLeaf = 1
                            }
                        }
                    }
                }
            }
        }
    }
    Write-Host "- Cleaning XML Nodes OK" -ForegroundColor Green
}

function createXSDSchema {
    param ([string] $XSDFileName)
    #This function is used to create XSD corresponding File
}

function cleanFile {
    param ([string] $fileName, [string] $source, [string] $target, [string] $XSDFileName, [string] $dataSourceName, [string] $databaseName)

    # -- Opening XML File -- #
    #Creation of the XML Document iteration path
    $date = Get-Date
    [string] $stringDate = ($date.Year*10000 + $date.Month*100 + $date.Day) * 1000000 + ($date.Hour * 10000 + $date.Minute* 100 + $date.Second)
    $date = $stringDate.substring(0,8) + "_" + $stringDate.substring(8,6)

    #determining the path of the source and the target files.
    $XMLDocPath = $source + $fileName
    $XMLFutureFileNamePreWork = $fileName.Substring(0,$fileName.Length - 4)
    $XMLFuturePath = $target + $XMLFutureFileNamePreWork + "cleaned" #_"+$date

    #Creation of the XML Document
    $XMLDoc = New-Object System.Xml.XmlDocument
    $XMLFile = Resolve-Path($XMLDocPath)

    #Loading of the XML File
    $XMLDoc.Load($XMLFile)
    [XML] $XMLDoc = Get-Content -Path $XMLDocPath

    #If the XML Document exists, then we clean it.
    if($XMLDoc.HasChildNodes) {
        #The XML Document is cleaned.
        cleanXMLContent $XMLDoc.FirstChild -endOfLeaf 0
        Write-Host "- XML Cleaned" -ForegroundColor Green

        #If it is a success, then we save it in a new file.
        #if($AnalysisFinished -eq 1) {
            #Modifying the XSD Attribute

            #setting the XSD name into the XML file
            createXSDSchema -XSDFileName $XSDFileName

            #Creation of the XML Document
            $XMLDoc.Save($XMLFuturePath+".xml")
            Write-Host "- Creation of the new XML File Successfull at "$XMLFuturePath -ForegroundColor Green

            #Creation of the XSD Corresponding Document
            #createXSDSchema -XMLPath $XMLFuturePath
        #}
    } else {
        Write-Host "Impossible"
    }
}

Here I'm executing the whole process with the different functions. When I'm launching each functions separately it works, but with many files it displays content and I lose a lot of time:

cd $sourceDirectory
$files = Get-ChildItem $sourceDirectory

# -- Local Variables Cleanning -- #
clearLocalVariables
Write-Host "- Variable cleaning  successfull" -ForegroundColor Green

# -- SQL Connection -- #
$nodesSQL = createSQLNodesList -dataSourceName $dataSourceName -databaseName $databaseName

foreach($file in $files){
    cleanFile -fileName $file -source $sourceDirectory -target $targetDirectory -XSDFileName $XSDFileName -dataSourceName $dataSourceName -databaseName $databaseName
}

Do you have any idea about how to avoid the different displays of the contents?

I have a lot of blank rows, that multiplies the cleaning time by 10 or 15.


Solution

  • Thanks to Ansgar Wiechers, I have found a way to accelerate my code, I use recursive way to develop my code. This way my code is much faster, but the content of the rows deleted was still printed.

    But to avoid having the content of the deleted nodes printed on screen, I had to use :

    [void]$RemovedNode.ParentNode.RemoveChild($RemovedNode)
    

    Instead of :

    $RemovedNode.ParentNode.RemoveChild($RemovedNode)