I'm curious to test out the performance/usefulness of asynchronous tasks in PowerShell with Start-ThreadJob
, Start-Job
and Start-Process
. I have a folder with about 100 zip files and so came up with the following test:
New-Item "000" -ItemType Directory -Force # Move the old zip files in here
foreach ($i in $zipfiles) {
$name = $i -split ".zip"
Start-Job -scriptblock {
7z.exe x -o"$name" .\$name
Move-Item $i 000\ -Force
7z.exe a $i .\$name\*.*
}
}
The problem with this is that it would start jobs for all 100 zip, which would probably be too much, so I want to set a value $numjobs
, say 5, which I can change, such that only $numjobs
will be started at the same time, and then the script will check for all 5 of the jobs ending before the next block of 5 will start. I'd like to then watch the CPU and memory depending upon the value of $numjobs
How would I tell a loop only to run 5 times, then wait for the Jobs to finish before continuing?
I see that it's easy to wait for jobs to finish
$jobs = $commands | Foreach-Object { Start-ThreadJob $_ }
$jobs | Receive-Job -Wait -AutoRemoveJobchange
but how might I wait for Start-Process
tasks to end?
Although I would like to use Parallel-ForEach
, the Enterprises that I work in will be solidly tied to PowerShell 5.1 for the next 3-4 years I expect with no chance to install PowerShell 7.x (although I would be curious for myself to test with Parallel-ForEach
on my home system to compare all approaches).
ForEach-Object -Parallel
and Start-ThreadJob
have built-in functionalities to limit the number of threads that can run at the same time, the same applies for Runspace with their RunspacePool which is what is used behind the scenes by both cmdlets.
Start-Job
does not offer such functionality because each Job runs in a separate process as opposed to the cmdlets mentioned before which run in different threads all in the same process. I would also personally not consider it as a parallelism alternative, it is pretty slow and in most cases a linear loop will be faster than it. Serialization and deserialization can be a problem in some cases too.
Both cmdlets offer the -ThrottleLimit
parameter for this.
$dir = (New-Item "000" -ItemType Directory -Force).FullName
# ForEach-Object -Parallel
$zipfiles | ForEach-Object -Parallel {
$name = [IO.Path]::GetFileNameWithoutExtension($_)
7z.exe x -o $name .\$name
Move-Item $_ $using:dir -Force
7z.exe a $_ .\$name\*.*
} -ThrottleLimit 5
# Start-ThreadJob
$jobs = foreach ($i in $zipfiles) {
Start-ThreadJob {
$name = [IO.Path]::GetFileNameWithoutExtension($using:i)
7z.exe x -o $name .\$name
Move-Item $using:i $using:dir -Force
7z.exe a $using:i .\$name\*.*
} -ThrottleLimit 5
}
$jobs | Receive-Job -Wait -AutoRemoveJob
The RunspacePool offer this same functionality, either with it's .SetMaxRunspaces(Int32)
Method or by targeting one of the RunspaceFactory.CreateRunspacePool
overloads offering a maxRunspaces
limit as argument.
$dir = (New-Item "000" -ItemType Directory -Force).FullName
$limit = 5
$iss = [initialsessionstate]::CreateDefault2()
$pool = [runspacefactory]::CreateRunspacePool(1, $limit, $iss, $Host)
$pool.ThreadOptions = [Management.Automation.Runspaces.PSThreadOptions]::ReuseThread
$pool.Open()
$tasks = foreach ($i in $zipfiles) {
$ps = [powershell]::Create().AddScript({
param($path, $dir)
$name = [IO.Path]::GetFileNameWithoutExtension($path)
7z.exe x -o $name .\$name
Move-Item $path $dir -Force
7z.exe a $path .\$name\*.*
}).AddParameters(@{ path = $i; dir = $dir })
$ps.RunspacePool = $pool
@{ Instance = $ps; AsyncResult = $ps.BeginInvoke() }
}
foreach($task in $tasks) {
$task['Instance'].EndInvoke($task['AsyncResult'])
$task['Instance'].Dispose()
}
$pool.Dispose()
Note that for all examples, it's unclear if the 7zip code is correct or not, this answer attempts to demonstrate how async is done in PowerShell not how to zip files / folders.
Below is a helper function that can simplify the process of parallel invocations, tries to emulate ForEach-Object -Parallel
and is compatible with PowerShell 5.1, though shouldn't be taken as a robust solution:
NOTE This Q&A offers a much better and robust alternative to below function.
using namespace System.Management.Automation
using namespace System.Management.Automation.Runspaces
using namespace System.Collections.Generic
function Invoke-Parallel {
[CmdletBinding()]
param(
[Parameter(Mandatory, ValueFromPipeline, DontShow)]
[object] $InputObject,
[Parameter(Mandatory, Position = 0)]
[scriptblock] $ScriptBlock,
[Parameter()]
[int] $ThrottleLimit = 5,
[Parameter()]
[hashtable] $ArgumentList
)
begin {
$iss = [initialsessionstate]::CreateDefault2()
if($PSBoundParameters.ContainsKey('ArgumentList')) {
foreach($argument in $ArgumentList.GetEnumerator()) {
$iss.Variables.Add([SessionStateVariableEntry]::new($argument.Key, $argument.Value, ''))
}
}
$pool = [runspacefactory]::CreateRunspacePool(1, $ThrottleLimit, $iss, $Host)
$tasks = [List[hashtable]]::new()
$pool.ThreadOptions = [PSThreadOptions]::ReuseThread
$pool.Open()
}
process {
try {
$ps = [powershell]::Create().AddScript({
$args[0].InvokeWithContext($null, [psvariable]::new("_", $args[1]))
}).AddArgument($ScriptBlock.Ast.GetScriptBlock()).AddArgument($InputObject)
$ps.RunspacePool = $pool
$invocationInput = [PSDataCollection[object]]::new(1)
$invocationInput.Add($InputObject)
$tasks.Add(@{
Instance = $ps
AsyncResult = $ps.BeginInvoke($invocationInput)
})
}
catch {
$PSCmdlet.WriteError($_)
}
}
end {
try {
foreach($task in $tasks) {
$task['Instance'].EndInvoke($task['AsyncResult'])
if($task['Instance'].HadErrors) {
$task['Instance'].Streams.Error
}
$task['Instance'].Dispose()
}
}
catch {
$PSCmdlet.WriteError($_)
}
finally {
if($pool) { $pool.Dispose() }
}
}
}
An example of how it works:
# Hashtable Key becomes the Variable Name inside the Runspace!
$outsideVariables = @{ Message = 'Hello from {0}' }
0..10 | Invoke-Parallel {
"[Item $_] - " + $message -f [runspace]::DefaultRunspace.InstanceId
Start-Sleep 5
} -ArgumentList $outsideVariables -ThrottleLimit 3