For my own implementation of a Build-Module
, I am in need to sort an array of functions and classes based on its dependencies. A similar question How to sort array of dependencies? has a helpful hint but it doesn't include any code. The question Topological Sorting using LINQ does include code but only C# based.
Therefore I decided to create simple data set of PSCustomObjects to be able to test my idea with a Minimal, Reproducible Example (in my own answer) before putting it in my full blown project:
$Array = ConvertFrom-Json @'
[
{ "Name": "Function1", "Dependency": ["Function3"] },
{ "Name": "Function2", "Dependency": ["Function3", "Function4", "Function5"] },
{ "Name": "Function3", "Dependency": [] },
{ "Name": "Function4", "Dependency": ["Function1", "Function5"] },
{ "Name": "Function5", "Dependency": ["Function1"] }
]
'@
The task is to sort the objects so that any object that is required by an other object is loaded first, meaning the expected result order should look like:
Name Dependency
---- ----------
Function3 {}
Function1 {Function3}
Function5 {Function1}
Function4 {Function1, Function5}
Function2 {Function3, Function4, Function5}
I have put my own answer below as possible approach and reuse but welcome any other comment or answer to tackle a topological sort in PowerShell differently.
Despite the attention on the meta issues in my question, apparently nobody caught the design bug in my original (deleted) self-answer.
In the end, I think that I can conclude that using a comparer for this isn't the correct way, mainly because the comparer<T>
class "compares two objects" and for a topological search it needs to handle multiple dependency (edges). Therefore I have created a Sort-Topological
to do the job.
For the specific example in the question:
$Array | Sort-Topological -Id Name -Dependency Dependency
Name Dependency
---- ----------
Function3 {}
Function1 {Function3}
Function5 {Function1}
Function4 {Function1, Function5}
Function2 {Function3, Function4, Function5}
For my specific use case, the Sort-Topological
cmdlet might be used as follows:
$Script = @'
Class Class1 : Class3 { }
Class Class2 : Class4, Class5 { }
Class Class3 { }
Class Class4 : Class5, Class3, Class1 { }
Class Class5 : Class1 { }
'@
$Ast = [Management.Automation.Language.Parser]::ParseInput($Script, [ref]$null, [ref]$null)
$Classes = $Ast.EndBlock.Statements
$Sorted = $Classes | Sort-Topological -Id Name -Dependency { $_.BaseTypes.TypeName.Name }
$Sorted.Name
Class3
Class1
Class5
Class4
Class2
Below is the code and an explanation of the first stable version. A full and up-to-date version including a usage help can be found on the PowerShell Gallery and Github.
Using Namespace System.Management.Automation
Using Namespace System.Management.Automation.Language
Using Namespace System.Collections
Using Namespace System.Collections.Generic
function Sort-Topological {
[Diagnostics.CodeAnalysis.SuppressMessage('PSUseApprovedVerbs', '', Scope = 'function', Target = '' )]
[CmdletBinding()]Param(
[Parameter(ValueFromPipeline = $True, Mandatory = $True)]$InputObject,
[Parameter(Position = 0, Mandatory = $True)][Alias('DependencyName')]$EdgeName,
[Parameter(Position = 1)][Alias('NameId')][String]$IdName
)
function PSError($Exception, $Id = 'TopologicalSortError', $Target = $Vertex, $Category = 'InvalidArgument') {
$PSCmdlet.ThrowTerminatingError([ErrorRecord]::new($Exception, $Id, $Category, $Target))
}
if ($EdgeName -is [ScriptBlock]) { # Prevent code injection
$Ast = [System.Management.Automation.Language.Parser]::ParseInput($EdgeName, [ref]$null, [ref]$null)
$Expression = $Ast.EndBlock.Statements.PipelineElements.Expression
While ($Expression -is [MemberExpressionAst] -and $Expression.Member -is [StringConstantExpressionAst]) {
$Expression = $Expression.Expression
}
if ($Expression -isnot [VariableExpressionAst] -or $Expression.VariablePath.UserPath -notin '_', 'PSItem') {
PSError "The { $Expression } expression should contain safe path." 'InvalidIdExpression' $Expression
}
}
elseif ($Null -ne $IdName -and $IdName -isnot [String]) { $IdName = "$IdName" }
Function FormatId ($Vertex) {
if ($Vertex -is [ValueType] -or $Vertex -is [String]) { $Value = $Vertex }
elseif (@($_.PSObject.Properties.Name).Contains($IdName)) { $Value = $_.PSObject.Properties[$IdName].Value }
else { return "[$(@($List).IndexOf($Vertex))]" }
if ($Value -is [String]) { """$Value""" } else { $Value }
}
$ById = $Null
$Sorted = [List[Object]]::new()
if ($Input) { $List = $Input } else { $List = $InputObject }
if ($List -isnot [iEnumerable]) { return $List }
$EdgeCount = 0
while ($Sorted.get_Count() -lt $List.get_Count()) {
$Stack = [Stack]::new()
$Enumerator = $List.GetEnumerator()
while ($Enumerator.MoveNext()) {
$Vertex = $Enumerator.Current
if($Sorted.Contains($Vertex)) { continue }
$Edges = [List[Object]]::new()
if ($EdgeName -is [ScriptBlock]) { $Edges = $Vertex.foreach($EdgeName) }
else { $Edges = $Vertex.PSObject.Properties[$EdgeName].Value }
if ($Edges -isnot [iList]) { $Edges = @($Edges) }
if ($Null -eq $ById -and $Edges.Count -gt 0) {
if ($Edges[0] -is [ValueType] -or $Edges[0] -is [String]) {
if (-not $IdName) { PSError 'Dependencies by id require the IdName parameter.' 'MissingIdName' }
$ById = @{}
foreach ($Item in $List) { $ById[$Item.PSObject.Properties[$IdName].Value] = $Item }
} else { $ById = $False }
}
if ($ById) {
$Ids = $Edges; $Edges = [List[Object]]::new()
foreach ($Id in $Ids) {
if ($Null -eq $Id) { } elseif ($ById.contains($Id)) { $Edges.Add($ById[$Id]) }
else{ PSError "Unknown vertex id: $(FormatId $Id)." 'UnknownVertex' }
}
}
if ($Stack.Count -gt 0 -or $Edges.Count -eq $EdgeCount) {
$ExistsAt = if ($Stack.Count -gt 0) { @($Stack.Current).IndexOf($Vertex) + 1 }
$Stack.Push($Enumerator)
if ($ExistsAt -gt 0) {
$Cycle = (@($Stack)[0..$ExistsAt].Current).foreach{ FormatId $_ } -Join ', '
PSError "Circular dependency: $Cycle." 'CircularDependency'
}
$Enumerator = $Edges.GetEnumerator()
}
}
if ($Stack.Count -gt 0) {
$Enumerator = $Stack.Pop()
$Vertex = $Enumerator.Current
if ($Vertex -is [ValueType] -or $Vertex -is [String]) { $Vertex = $ById[$Vertex] }
if (-not $Sorted.Contains($Vertex)) { $Sorted.Add($Vertex) }
}
else { $EdgeCount++ }
}
$Sorted
}
The PSError
function is a helper to properly throw an error to the caller using the ThrowTerminatingError
method. For details see this helpful answer from mklement0.
The code in the condition if ($EdgeName -is [ScriptBlock]) { ...
prevents code injection. The $EdgeName
(alias $DependencyName
) accepts any object including a ScriptBlock
that contains a property path in the form of $_.<verbatim path>
, e.g.: $_.BaseTypes.TypeName.Name
. The code makes sure that the path contains the $_
or $PSItem
followed by a dot-notation path containing only string constants.
The FormatId
function helps to display the object identification in (circular) error messages.
The main topological sort code is based on Kahn's algorithm where $EdgeCount
is used to select any "start nodes" that have no incoming edges and insert them into the $Sorted
list.
The while ($Sorted.get_Count() -lt $List.get_Count()) { ...
continue until the $Sorted
list has as many objects as the initially provided $InputObject
list.
The Kahn's algorithm is recursive. Knowing that PowerShell functions are rather expensive (see this answer), it is better to avoid recursive function, instead it is faster to build your own stack, see this this helpful answer from Santiago Squarzon.
$Enumerator
on the stack that holds both the current item and the pointer to move to the next item (MoveNext
)if($Sorted.Contains($Vertex)) { continue }
means if the current object is already in the sorted list, continue to the next
$Edges = [List[Object]]::new()
is filled with all the objects where the current object is depended on.
In case the depended objects are identified by object (see user help), the code contained by if ($Null -eq $ById -and $Edges.Count -gt 0) { ...
will create a hash table mapping (contained by $ById
) to quickly access the actual object. Note that $ById
might contain the following value types:
$Null
, $ById is not initialized$False
, The objects are linked by objectif the object dependencies are by property (id), the code within the if ($ById) { ...
condition builds a new list of dependencies ($Edge) that directly refer to the actual objects.
The code within if ($Stack.Count -gt 0 -or $Edges.Count -eq $EdgeCount) { ...
, adds the depended objects of the current node to the stack if there are any depended objects ($Stack.Count
) or in case the current object is a "start node" ($Edges.Count -eq $EdgeCount
).
$ExistsAt = if ($Stack.Count -gt 0) { @($Stack.Current).IndexOf($Vertex) + 1 }
. If the current object is already on the stack, there is apparently a circular dependency and the script should terminate to prevent an infinitive loop.
If there are objects left on the stack (if ($Stack.Count -gt 0) { ...
), process them (until $Sorted.get_Count()
equals $List.get_Count()
).