The increase assignment operator (+=
) is often used in PowerShell
questions and answers at the StackOverflow site to construct a collection objects, e.g.:
$Collection = @()
1..$Size | ForEach-Object {
$Collection += [PSCustomObject]@{Index = $_; Name = "Name$_"}
}
Yet it appears an very inefficient operation.
Is it Ok to generally state that the increase assignment operator (+=
) should be avoided for building an object collection in PowerShell?
Note: 2024-08-30
A new version of PowerShell (see v7.5.0-preview.4) which has a major improvement towards this issue is about to be come out. Although it is probably still recommended to use explicit assignment, the benchmarks in this answer are outdated.
For more details see the helpful addendum from Santiago Squarzon.
Yes, the increase assignment operator (+=
) should be avoided for building an object collection.
(see also: PowerShell scripting performance considerations).
Apart from the fact:
+=
operator usually requires more statements (because of the array initialization = @()
), andThe reason it is inefficient is because every time you use the +=
operator, it will just do:
$Collection = $Collection + $NewObject
Because arrays are immutable in terms of element count, the whole (growing) collection will be recreated with every iteration.
The correct PowerShell syntax is:
$Collection = 1..$Size | ForEach-Object {
[PSCustomObject]@{Index = $_; Name = "Name$_"}
}
Note: as with other cmdlets; if there is just one item (iteration), the output will be a scalar and not an array, to force it to an array, you might either us the [Array]
type: [Array]$Collection = 1..$Size | ForEach-Object { ... }
or use the Array subexpression operator @( )
: $Collection = @(1..$Size | ForEach-Object { ... })
Where it is recommended to not even store the results in a variable ($a = ...
) but immediately pass it into the pipeline to save memory, e.g.:
1..$Size | ForEach-Object {
[PSCustomObject]@{Index = $_; Name = "Name$_"}
} | ConvertTo-Csv .\Outfile.csv
Note: Using the System.Collections.ArrayList
class could also be considered, this is generally almost as fast as the PowerShell pipeline but the disadvantage is that it consumes a lot more memory than (properly) using the PowerShell pipeline.
see also: Fastest Way to get a uniquely index item from the property of an array and Array causing 'system.outofmemoryexception'
To show the relation with the collection size and the decrease of performance you might check the following test results:
1..20 | ForEach-Object {
$size = 1000 * $_
$Performance = @{Size = $Size}
$Performance.Pipeline = (Measure-Command {
$Collection = 1..$Size | ForEach-Object {
[PSCustomObject]@{Index = $_; Name = "Name$_"}
}
}).Ticks
$Performance.Increase = (Measure-Command {
$Collection = @()
1..$Size | ForEach-Object {
$Collection += [PSCustomObject]@{Index = $_; Name = "Name$_"}
}
}).Ticks
[pscustomobject]$Performance
} | Format-Table *,@{n='Factor'; e={$_.Increase / $_.Pipeline}; f='0.00'} -AutoSize
Size Increase Pipeline Factor
---- -------- -------- ------
1000 1554066 780590 1.99
2000 4673757 1084784 4.31
3000 10419550 1381980 7.54
4000 14475594 1904888 7.60
5000 23334748 2752994 8.48
6000 39117141 4202091 9.31
7000 52893014 3683966 14.36
8000 64109493 6253385 10.25
9000 88694413 4604167 19.26
10000 104747469 5158362 20.31
11000 126997771 6232390 20.38
12000 148529243 6317454 23.51
13000 190501251 6929375 27.49
14000 209396947 9121921 22.96
15000 244751222 8598125 28.47
16000 286846454 8936873 32.10
17000 323833173 9278078 34.90
18000 376521440 12602889 29.88
19000 422228695 16610650 25.42
20000 475496288 11516165 41.29
Meaning that with a collection size of 20,000
objects using the +=
operator is about 40x
slower than using the PowerShell pipeline for this.
Apparently some people struggle with correcting a script that already uses the increase assignment operator (+=
). Therefore, I have created a little instruction to do so:
Note:
Changing the array initialization<Variable> = @()
to something like:<Variable> = [Collections.Generic.List[Object]]::new()
is not enough to resolve this performance issue because as soon as you use the+
or+=
operator on it, it will be changed back to an immutable array type.
<variable> +=
assignments from the concerned iteration, just leave only the object item. By not assigning the object, the object will simply be put on the pipeline.
ForEach ( ... ) {
$Array += $Object1
$Array += $Object2
ForEach ( ... ) {
$Array += $Object3
$Array += Get-Object
}
}
Is essentially the same as:
ForEach ( ... ) {
$Object1
$Object2
ForEach ( ... ) {
$Object3
Get-Object
}
}
Note: if there is no iteration, there is probably no reason to change your script as likely only concerns a few additions
$Array = @()
). e.g.:
$Array = ForEach ( ... ) { ...
Note 1: Again, if you want single object to act as an array, you probably want to use the Array subexpression operator @( )
but you might also consider to do this at the moment you use the array, like: @($Array).Count
or ForEach ($Item in @($Array))
Note 2: Again, you're better off not assigning the output at all. Instead, pass the pipeline output directly to the next cmdlet to free up memory: ... | ForEach-Object {...} | Export-Csv .\File.csv
.
<Variable> = @()
For a full example, see:
Note that the same applies for using +=
to build strings (
see: Is there a string concatenation shortcut in PowerShell?) and also building HashTables like: $HashTable += @{ $NewName = $Value }