I'm looking for some performance advise, the code below works fine and it serves its purpose however it's terribly slow.
The result of the code is a hashtable of hashtables that I need use as lookup table later on.
Background:
$ccDb
is an array consisting of around 200k items, the properties are companyCd, costCenterNbr, costCenterShortNm, costCenterLongDescr
.costCenterNbr
is contained on companyCd
, meaning, each companyCd
can have multiple costCenterNbr
.companyCd
can contain X amount of costCenterNbr
.costCenterNbr
have unique value, same for companyCd
.costCenterShortNm
and costCenterLongDescr
are correlated with costCenterNbr
The Issue:
This map has to be constructed on each run of my script because the information is taken from SQL tables (which changes all the time). Building this map takes sometimes up to 15 minutes on a pretty good server.
The question:
Do you see a way this code could be improved for a faster / more efficient execution?
$ccMap = @{}
foreach ($line in $ccDb) {
$companyCd = $line.companyCd.trim()
$costCenterNbr = $line.costCenterNbr.trim()
$costCenterShortNm = $line.CostCenterShortNm.trim()
$costCenterLongDescr = $line.CostCenterLongDescr.trim()
$coceMap = @{
$costCenterNbr = @{
shortDesc = $costCenterShortNm
longDesc = $costCenterLongDescr
}
}
if ($ccMap.ContainsKey($companyCd)) {
$ccMap[$companyCd] += $coceMap
}
else {
$ccMap.Add($companyCd, $coceMap)
}
}
I'm sorry for the long explanation, but I feel like it's better to give the most information up front. Any help is very much appreciated.
Adding measurements for reference:
And total key sum:
Thanks a lot Mathias R. Jessen, here are the measurements of his excellent code. From 7 minutes to 5 seconds!
+=
in tight loopsHere's your biggest sink:
$ccMap[$companyCd] += $coceMap
When you add one hashtable to another using +
(or +=
for that matter), PowerShell creates a whole new hashtable:
# Create two different hashtables
$A = @{ Key1 = 'Value1' }
$B = @{ Key2 = 'Value2' }
# Let's save a second reference to the first table
$remember = $A
# Now let's use += to merge the two:
$A += $B
Run this and you'll find $B
and $remember
are unchanged, but $A
has both keys - and must therefore be a new one.
To get around this performance penalty, skip the construction of $coceMap
completely, and reverse the order (construct hashtable first if not present, then assign):
$ccMap=@{}
foreach($line in $ccDb)
{
$companyCd=$line.companyCd.trim()
$costCenterNbr=$line.costCenterNbr.trim()
$costCenterShortNm=$line.CostCenterShortNm.trim()
$costCenterLongDescr=$line.CostCenterLongDescr.trim()
# Create new hashtable if none exist, otherwise retrieve the existing one
if($ccMap.ContainsKey($companyCd))
{
$coceMap = $ccMap[$companyCd]
}
else
{
$coceMap = $ccMap[$companyCd] = @{}
}
$coceMap[$costCenterNbr] = @{
shortDesc=$costCenterShortNm
longDesc=$costCenterLongDescr
}
}
+=
Here's a simplified example of the difference against 10000 items with 50 disctinct keys:
$data = @(
1..10000 |Select-Object @{Name='Company';Expression={Get-Random -Maximum 50}},@{Name='CostCenter';Expression={Get-Random}}
)
@(
Measure-Command {
$map = @{}
foreach($line in $data){
$entry = @{
$line.CostCenter = @{
Value = 123
}
}
if($map.ContainsKey($line.Company)){
$map[$line.Company] += $entry
}
else {
$map[$line.Company] = $entry
}
}
}
Measure-Command {
$map = @{}
foreach($line in $data){
if($map.ContainsKey($line.Company)){
$entry = $map[$line.Company]
}
else {
$entry = $map[$line.Company] = @{}
}
$entry[$line.CostCenter] = @{
Value = 123
}
}
}
) |select TotalMilliseconds
Which on my laptop gives:
TotalMilliseconds
-----------------
306.4218
47.8164
There are a number of ways to profile the runtime behavior of PowerShell, but here's my personal first choice:
PSProfiler
(Disclaimer: I'm the maintainer of PSProfiler
):
Install-Module PSProfiler -Scope CurrentUser
Measure-Script
the same way you would Measure-Command
:Measure-Script {
$map = @{}
foreach($line in $data){
$entry = @{
$line.CostCenter = @{
Value = 123
}
}
if($map.ContainsKey($line.Company)){
$map[$line.Company] += $entry
}
else {
$map[$line.Company] = $entry
}
}
}
Anonymous ScriptBlock
Count Line Time Taken Statement
----- ---- ---------- ---------
0 1 00:00.0000000 {
1 2 00:00.0000187 $map = @{}
0 3 00:00.0000000
0 4 00:00.0000000 foreach($line in $data){
10000 5 00:00.0635585 $entry = @{
0 6 00:00.0000000 $line.CostCenter = @{
0 7 00:00.0000000 Value = 123
0 8 00:00.0000000 }
0 9 00:00.0000000 }
0 10 00:00.0000000
0 11 00:00.0000000 if($map.ContainsKey($line.Company)){
9950 12 00:00.3965227 $map[$line.Company] += $entry
0 13 00:00.0000000 }
0 14 00:00.0000000 else {
50 15 00:00.0002810 $map[$line.Company] = $entry
0 16 00:00.0000000 }
0 17 00:00.0000000 }
0 18 00:00.0000000 }
Observe that line 12 took the up the most total execution time - significantly more than any other:
9950 12 00:00.3965227 $map[$line.Company] += $entry