.netpowershellperformanceget-childitempowershell-7.4

Collections and the Powershell Pipeline


I have a regular requirement to remove a large number of small files (sometimes >100,000) from a server. These files contained monitoring data from remote sensors and are generated on different schedules from different devices. Unfortunately, I can't optimise the input.

[Edit] Updated the code to the version that originally sparked the question. I had posted a later version that had similar problems.

I can do something like

$filePath = '\my\path'
$CutoffDate = (Get-Date).AddDays(-30) # Calculate date thirty days ago. 
Get-ChildItem -File -Path $filePath -Recurse | Where-Object {$_.lastWriteTime -le $CutoffDate} | remove-item 

This works well for small numbers of files, but for the numbers of files I have to work with it can use a huge amount of memory, and can take a long time.

It appears that the Get-ChildItem cmdlet is building the complete collection before submitting it to the pipeline.

I can't filter on date with Get-ChildItem, so every file in the target folders is read, and there can be millions.


Is my assumption correct about the initial collection?

Is there some way to modify the pipeline operation so that each element is submitted to the pipeline as it is found?

Alternatively, is there some way to move the date filtering to Get-ChildItem so that the initial search is reduced in size?


Solution

  • Get-ChildItem is pretty slow, it's a known issue, if you want to have faster code you need to use the .NET APIs. This code should be pretty fast compared to your current one, and should consume less memory. It is worth noting, this implementation will not exclude hidden files and folders, if you need to exclude them a need condition has to be added, please provide feedback in that case and I'll update my answer (essentially need to check if .Attributes.HasFlag([System.IO.FileAttributes]::Hidden) and then exclude, continue).

    $filePath = Get-Item '\my\path'
    $CutoffDate = (Get-Date).AddDays(-30) # Calculate date thirty days ago.
    
    $enum = $filePath.
        EnumerateFiles('*', [System.IO.SearchOption]::AllDirectories).
        GetEnumerator()
    
    while ($true) {
        try {
            if (-not $enum.MoveNext()) {
                break
            }
        }
        catch {
            # ignore inaccessible folders, go next
            continue
        }
    
        if ($enum.Current.LastWriteTime -le $CutoffDate) {
            try {
                $enum.Current.Delete()
            }
            catch {
                # you can handle files that couldn't be delete here,
                # possible permission issue, otherwise leave this empty
                # to ignore any error
            }
        }
    }
    

    EDIT - Just noticed the tag, in which case there is a much better and easier approach using the EnumerationOptions Class, this class isn't available in .NET Framework.

    $filePath = Get-Item '\my\path'
    $CutoffDate = (Get-Date).AddDays(-30) # Calculate date thirty days ago.
    
    $options = [System.IO.EnumerationOptions]@{
        IgnoreInaccessible    = $true
        RecurseSubdirectories = $true
        # Remove below line if you want to delete hidden files
        AttributesToSkip      = [System.IO.FileAttributes]::Hidden
    }
    foreach ($file in $filePath.EnumerateFiles('*', $options)) {
        if ($file.LastWriteTime -le $CutoffDate) {
            $file.Delete()
        }
    }