powershellsortingdependenciescomparetopological-sort

How to do a topological sort in PowerShell


For my own implementation of a Build-Module, I am in need to sort an array of functions and classes based on its dependencies. A similar question How to sort array of dependencies? has a helpful hint but it doesn't include any code. The question Topological Sorting using LINQ does include code but only C# based.

Therefore I decided to create simple data set of PSCustomObjects to be able to test my idea with a Minimal, Reproducible Example (in my own answer) before putting it in my full blown project:

$Array = ConvertFrom-Json @'
[
  { "Name": "Function1", "Dependency": ["Function3"] },
  { "Name": "Function2", "Dependency": ["Function3", "Function4", "Function5"] },
  { "Name": "Function3", "Dependency": [] },
  { "Name": "Function4", "Dependency": ["Function1", "Function5"] },
  { "Name": "Function5", "Dependency": ["Function1"] }
]
'@

The task is to sort the objects so that any object that is required by an other object is loaded first, meaning the expected result order should look like:

Name      Dependency
----      ----------
Function3 {}
Function1 {Function3}
Function5 {Function1}
Function4 {Function1, Function5}
Function2 {Function3, Function4, Function5}

I have put my own answer below as possible approach and reuse but welcome any other comment or answer to tackle a topological sort in PowerShell differently.


Solution

  • Despite the attention on the meta issues in my question, apparently nobody caught the design bug in my original (deleted) self-answer.

    In the end, I think that I can conclude that using a comparer for this isn't the correct way, mainly because the comparer<T> class "compares two objects" and for a topological search it needs to handle multiple dependency (edges). Therefore I have created a Sort-Topological to do the job.

    For the specific example in the question:

    $Array | Sort-Topological -Id Name -Dependency Dependency 
    
    Name      Dependency
    ----      ----------
    Function3 {}
    Function1 {Function3}
    Function5 {Function1}
    Function4 {Function1, Function5}
    Function2 {Function3, Function4, Function5}
    

    For my specific use case, the Sort-Topological cmdlet might be used as follows:

    $Script = @'
        Class Class1 : Class3 { }
        Class Class2 : Class4, Class5 { }
        Class Class3 { }
        Class Class4 : Class5, Class3, Class1 { }
        Class Class5 : Class1 { }
    '@
    
    $Ast = [Management.Automation.Language.Parser]::ParseInput($Script, [ref]$null, [ref]$null)
    $Classes = $Ast.EndBlock.Statements
    $Sorted = $Classes | Sort-Topological -Id Name -Dependency { $_.BaseTypes.TypeName.Name }
    $Sorted.Name
    
    Class3
    Class1
    Class5
    Class4
    Class2
    

    Sort-Topological

    Below is the code and an explanation of the first stable version. A full and up-to-date version including a usage help can be found on the PowerShell Gallery and Github.

    Using Namespace System.Management.Automation
    Using Namespace System.Management.Automation.Language
    Using Namespace System.Collections
    Using Namespace System.Collections.Generic
    
    function Sort-Topological {
        [Diagnostics.CodeAnalysis.SuppressMessage('PSUseApprovedVerbs', '', Scope = 'function', Target = '' )]
        [CmdletBinding()]Param(
            [Parameter(ValueFromPipeline = $True, Mandatory = $True)]$InputObject,
            [Parameter(Position = 0, Mandatory = $True)][Alias('DependencyName')]$EdgeName,
            [Parameter(Position = 1)][Alias('NameId')][String]$IdName
        )
        
        function PSError($Exception, $Id = 'TopologicalSortError', $Target = $Vertex, $Category = 'InvalidArgument') {
            $PSCmdlet.ThrowTerminatingError([ErrorRecord]::new($Exception, $Id, $Category, $Target))
        }
        
        if ($EdgeName -is [ScriptBlock]) { # Prevent code injection
            $Ast = [System.Management.Automation.Language.Parser]::ParseInput($EdgeName, [ref]$null, [ref]$null)
            $Expression = $Ast.EndBlock.Statements.PipelineElements.Expression
            While ($Expression -is [MemberExpressionAst] -and $Expression.Member -is [StringConstantExpressionAst]) {
                $Expression = $Expression.Expression
            }
            if ($Expression -isnot [VariableExpressionAst] -or $Expression.VariablePath.UserPath -notin '_', 'PSItem') {
                PSError "The { $Expression } expression should contain safe path." 'InvalidIdExpression' $Expression
            }
        }
        elseif ($Null -ne $IdName -and $IdName -isnot [String]) { $IdName = "$IdName" }
        
        Function FormatId ($Vertex) {
            if ($Vertex -is [ValueType] -or $Vertex -is [String]) { $Value = $Vertex }
            elseif (@($_.PSObject.Properties.Name).Contains($IdName)) { $Value = $_.PSObject.Properties[$IdName].Value }
            else { return "[$(@($List).IndexOf($Vertex))]" }
            if ($Value -is [String]) { """$Value""" } else { $Value }
        }
        
        $ById = $Null
        $Sorted = [List[Object]]::new()
        if ($Input) { $List = $Input } else { $List = $InputObject }
        if ($List -isnot [iEnumerable]) { return $List }
        $EdgeCount = 0
        while ($Sorted.get_Count() -lt $List.get_Count()) {
            $Stack = [Stack]::new()
            $Enumerator = $List.GetEnumerator()
            while ($Enumerator.MoveNext()) {
                $Vertex = $Enumerator.Current
                if($Sorted.Contains($Vertex)) { continue }
                $Edges = [List[Object]]::new()
                if ($EdgeName -is [ScriptBlock]) { $Edges = $Vertex.foreach($EdgeName) }
                else { $Edges = $Vertex.PSObject.Properties[$EdgeName].Value }
                if ($Edges -isnot [iList]) { $Edges = @($Edges) }
                if ($Null -eq $ById -and $Edges.Count -gt 0) {
                    if ($Edges[0] -is [ValueType] -or $Edges[0] -is [String]) {
                        if (-not $IdName) { PSError 'Dependencies by id require the IdName parameter.' 'MissingIdName' }
                        $ById = @{}
                        foreach ($Item in $List) { $ById[$Item.PSObject.Properties[$IdName].Value] = $Item }
                    } else { $ById = $False }
                }
                if ($ById) {
                    $Ids = $Edges; $Edges = [List[Object]]::new()
                    foreach ($Id in $Ids) {
                        if ($Null -eq $Id) { } elseif ($ById.contains($Id)) { $Edges.Add($ById[$Id]) }
                        else{ PSError "Unknown vertex id: $(FormatId $Id)." 'UnknownVertex' }
                    }
                }
                if ($Stack.Count -gt 0 -or $Edges.Count -eq $EdgeCount) {
                    $ExistsAt = if ($Stack.Count -gt 0) { @($Stack.Current).IndexOf($Vertex) + 1 }
                    $Stack.Push($Enumerator)
                    if ($ExistsAt -gt 0) {
                        $Cycle = (@($Stack)[0..$ExistsAt].Current).foreach{ FormatId $_ } -Join ', '
                        PSError "Circular dependency: $Cycle." 'CircularDependency'
                    }
                    $Enumerator = $Edges.GetEnumerator()
                }
            }
            if ($Stack.Count -gt 0) {
                $Enumerator = $Stack.Pop()
                $Vertex = $Enumerator.Current
                if ($Vertex -is [ValueType] -or $Vertex -is [String]) { $Vertex = $ById[$Vertex] }
                if (-not $Sorted.Contains($Vertex)) { $Sorted.Add($Vertex) }
            }
            else { $EdgeCount++ }
        }
        $Sorted
    }
    

    Explanation