performancepowershellpipelinememory-efficient

Is there a way to stream data out faster from a large command?


Let's say I'm using get-childitem c:\*.* -recurse and I am piping it. I have to wait for the whole get-childitem command to complete before the pipe handles it. There are exceptions such as select -first 2 which magically stops the previous command. Anyhow is there a way to improve output so it write right away instead of soaking up a ton of ram? One idea I have is...(which i know won't work, but it gets the idea across)

[System.IO.File]::ReadLines("$(dir c:\*.* -recurse)")

I know this is a windows thing because Linux will work with data as soon as it shows up. But two different worlds, I know.

My biggest concern is ram usage...

Here is a great example

(1..10000000) | where {$_ -like "*543*"}

this takes my machine about 100 Seconds

where

(1..10000000).where({$_ -like "*543*"})

only took 25 seconds.


Solution

  • I have to wait for the whole get-childitem command to complete before the pipe handles it.

    No: The very point of PowerShell's pipeline is to process objects one by one, as they become available, thereby acting as a memory throttle that keeps memory use constant irrespective of the size of the input collection.

    However, what is indeed missing is the ability to stop pipeline processing on demand - which currently only Select-Object -First can do - see this answer of mine.
    There's a longstanding feature request on GitHub that asks for a mechanism to stop a pipeline on demand.


    As an aside: Using the PSv4+ .Where() method is indeed faster than using the Where-Object cmdlet (whose built-in alias is where), but .Where() invariably requires the collection that it operates on to have been loaded into memory in full beforehand.

    However, the .Where() method does have the ability to stop processing remaining items by passing 'First' as the 2nd argument, which stops after the first match; 'First' is an instance of [System.Management.Automation.WhereOperatorSelectionMode]; compare the performance of
    (1..1e6).Where({$_ -eq 10}) to that of
    (1..1e6).Where({$_ -eq 10}, 'First')


    [1] PowerShell does not use temporary files to ease the memory pressure the way the Unix sort utility does, for instance; my guess is that doing so is not really an option in PowerShell: PowerShell's ability to process live objects (rather than static strings) would present significant serialization / deserialization challenges were temporary file to be used.

    [2] However, 1..10000000 | ... and & { foreach ($i in 1..10000000) { $i } } | ... would work: Uniquely among PowerShell's operators, .., the range operator is implemented as a lazy .NET enumerable, which direct pipeline input and use in a foreach conditional can take advantage of.