listpowershellsortinguniquedirectoryinfo

In Powershell, how do I sort a Collections.Generic.List of DirectoryInfo?


I want a list of the unique directories that contain a file matching subjectPattern. I can get the list, but to get the unique directories, I need to sort it. But because the list is of type Collections.Generic.List[DirectoryInfo], I'm having trouble finding a working API.

function Get-Containers([Parameter(Mandatory)][string]$subjectPattern) {
    #NOTE: The class for directories is System.IO.DirectoryInfo, the class for files is System.IO.FileInfo
    $fatList = New-Object Collections.Generic.List[System.IO.DirectoryInfo]    
    $result = New-Object Collections.Generic.List[System.IO.DirectoryInfo]
    foreach ($leafName in (get-childitem -recurse -force -path . -include $subjectPattern)) {
        $fatList += (Get-Item $leafName).Directory
    }
    #Get-Unique only works on sorted collections, Sort-Object won't work without a Property,
    # but "FullName" is not a property of Collections.Generic.List
    # Furthermore, Sort() is not a method of [System.IO.DirectoryInfo]
    $result = ($fatList.Sort() | Get-Unique )
    return $result
}

How do I sort, then get unique items in Collections.Generic.List[System.IO.DirectoryInfo] ?


Solution

  • From your inline comments:

    [...] Sort-Object won't work without a Property, but "FullName" is not a property of Collections.Generic.List

    That's fine, we're not sorting multiple lists, we're sorting multiple DirectoryInfo objects that happen to be contained in a single list.

    The big question is: Do you need to sort in-place?

    Sorting "in-place" means re-arranging the objects inside the list, so that the list itself retains the new sort order and its identity. This is usually less resource-intensive, but slightly complicated in PowerShell.

    The alternative is to enumerate the items in the list, sort them externally and then (optionally) wrapping the re-ordered items in a new list - much easier to implement, but at a resource cost (which you may or may not notice depending on the size of the collection and the complexity of the comparison).

    Sorting in-place

    In order to sort multiple DirectoryInfo objects, we need a way to instruct the List[DirectoryInfo].Sort() method on how to compare the objects to each other and determine which comes before or after the other in the sort order.

    Looking at the Sort() method overloads gives us a clue:

    PS ~> $list = [System.Collections.Generic.List[System.IO.DirectoryInfo]]::new()
    PS ~> $list.Sort
    
    OverloadDefinitions
    -------------------
    void Sort()
    void Sort(System.Collections.Generic.IComparer[System.IO.DirectoryInfo] comparer)
    void Sort(int index, int count, System.Collections.Generic.IComparer[System.IO.DirectoryInfo] comparer)
    void Sort(System.Comparison[System.IO.DirectoryInfo] comparison)
    

    So we need something that implements the generic interface IComparer[T].

    Using PowerShell's ability to define new types at runtime using the class keyword, we can do:

    using namespace System.Collections.Generic
    using namespace System.IO
    
    class DirectoryInfoComparer : IComparer[DirectoryInfo]
    {
        [string]$PropertyName
        [bool]$Descending = $false
    
        DirectoryInfoComparer([string]$property)
        {
            $this.PropertyName = $property
        }
    
        DirectoryInfoComparer([string]$property, [bool]$descending)
        {
            $this.PropertyName = $property
            $this.Descending = $descending
        }
    
        [int]Compare([DirectoryInfo]$a, [DirectoryInfo]$b)
        {
            $res = if($a.$($this.PropertyName) -eq $b.$($this.PropertyName))
            {
                0
            }
            elseif($a.$($this.PropertyName) -lt $b.$($this.PropertyName))
            {
                -1
            }
            else
            {
                1
            }
    
            if($this.Descending){
                $res *= -1
            }
    
            return $res 
        }
    }
    

    ... and now we can sort the list in-place based on a property name, just like with Sort-Object:

    # Create a list
    $list = [List[DirectoryInfo]]::new()
    
    # Add directories in non-sorted order
    mkdir c,a,b -Force |ForEach-Object { $list.Add($_) }
    
    # Instantiate a comparer based on the `FullName` property
    $fullNameComparer = [DirectoryInfoComparer]::new("FullName")
    
    # Now sort the list
    $list.Sort($fullNameComparer)
    
    # Observe that items are now sorted based on FullName value
    $list.FullName
    

    Sort externally

    Now that we know the trials we must go through to sort a generic collection in-place, let's review the process of sorting the collection externally:

    $sorted = $list |Sort-Object FullName
    

    If we need the resulting (now sorted) collection to also be of type [List[Directory]], we can either clear and re-populate the original list:

    $list.Clear()
    $sorted |ForEach-Object {$list.Add($_)}
    

    ... or we can create a new [List[DirectoryInfo]] instance:

    $list = [List[DirectoryInfo]]::new([DirectoryInfo[]]$sorted)
    

    How about a SortedSet[DirectoryInfo]?

    As already suggested, a "set" might be a better collection type for the purpose of only storing unique items.

    The HashSet[T] type is an unordered set, but .NET also comes with a SortedSet[T] type - and you won't believe what it requires to implement the sort order - that's right, an IComparer[T]! :-)

    In this case, we'll want to inject the comparer into the constructor when we create the set:

    # Once again, we need an IComparer[DirectoryInfo] instance
    $comparer = [DirectoryInfoComparer]::new("FullName")
    
    # Then we create the set, injecting our custom comparer
    $set = [System.Collections.Generic.SortedSet[System.IO.DirectoryInfo]]::new($comparer)
    
    # Now let's add a bunch of directories in completely jumbled order
    Get-ChildItem -Recurse -Directory |Select -First 10 |Sort {Get-Random} |ForEach-Object {
        # The Add() method emits a boolean indicating whether the item 
        # is unique or already exists in the set, hence the [void] cast
        [void]$set.Add($_)
    }
    
    # Once again, observe that enumerating the set emits the items sorted
    $set.FullName
    

    As you can see, there are multiple options available, with varying degrees of complexity and performance characteristics. It's not entirely clear from your question why you're using a generic list or why you insist on sorting it using List.Sort(), so my recommendation would be to test them all out and see what works best for you