powershellparallel-processingforeach-object

Recursively call a function from itself inside a ForEach-Object -Parallel block - Function is not recognized inside the parallel block


First time asker here. Please be kind :)

I'm attempting to recursively get all directories in a parallel manner in hopes of decreasing the time it takes to traverse through a drive. Below is the code I've tried. Essentially what I'm looking to do is input a folder and do the same in parallel for it's subfolder and their subfolders and so on, but the function is not recognized inside the parallel block

function New-RecursiveDirectoryList {
    [CmdletBinding()]
    param (
        # Specifies a path to one or more locations.
        [Parameter(Mandatory = $true,
            Position = 0,
            ValueFromPipeline = $true,
            ValueFromPipelineByPropertyName = $true,
            HelpMessage = 'Path to one or more locations.')]
        [Alias('PSPath')]
        [ValidateNotNullOrEmpty()]
        [string[]]
        $Path
    )
    process {
        foreach ($aPath in $Path) {
            Get-Item $aPath

            Get-ChildItem -Path $aPath -Directory |
                # Recursively call itself in Parallel block not working
                # Getting error "The term 'New-RecursiveDirectoryList' is not recognized as a name of a cmdlet"
                # Without -Parallel switch this works as expected
                ForEach-Object -Parallel {
                    $_ | New-RecursiveDirectoryList
                }
        }
    }
}

Error:

New-RecursiveDirectoryList: 
Line |
   2 |                      $_ | New-RecursiveDirectoryList
     |                           ~~~~~~~~~~~~~~~~~~~~~~~~~~
     | The term 'New-RecursiveDirectoryList' is not recognized as a name of a cmdlet, function, script file, or executable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

I've also attempted to use the solution provided by mklement0 here but no luck. Below is my attempt at this:

    function CustomFunction {
    [CmdletBinding()]
    param (
        # Specifies a path to one or more locations.
        [Parameter(Mandatory = $true,
            Position = 0,
            ValueFromPipeline = $true,
            ValueFromPipelineByPropertyName = $true,
            HelpMessage = 'Path to one or more locations.')]
        [Alias('PSPath')]
        [ValidateNotNullOrEmpty()]
        [string[]]
        $Path
    )

    begin {
        # Get the function's definition *as a string*
        $funcDef = $function:CustomFunction.ToString()
    }

    process {
        foreach ($aPath in $Path) {
            Get-Item $aPath

            Get-ChildItem -Path $aPath -Directory |
                # Recursively call itself in Parallel block not working
                # Getting error "The term 'New-RecursiveDirectoryList' is not recognized as a name of a cmdlet"
                # Without -Parallel switch this works as expected
                ForEach-Object -Parallel {
                    $function:CustomFunction = $using:funcDef
                    $_ | CustomFuction
                }
        }
    }
}

Error

CustomFuction: 
Line |
   3 |                      $_ | CustomFuction
     |                           ~~~~~~~~~~~~~
     | The term 'CustomFuction' is not recognized as a name of a cmdlet, function, script file, or executable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

Does anybody know how this may be accomplished or a different way of doing this?


Solution

  • So, this worked for me, it obviously doesn't look pretty. One thing to note, the foreach ($aPath in $Path) {...} on your script is unnecessary, the process {...} block will handle that for you when you pass multiple paths.

    Code:

    function Test {
        [CmdletBinding()]
        param (
            # Specifies a path to one or more locations.
            [Parameter(
                Mandatory,
                ParameterSetName = 'LiteralPath',
                ValueFromPipelineByPropertyName,
                Position = 0)]
            [Alias('PSPath')]
            [string[]] $LiteralPath
        )
    
        begin {
            $scriptblock = $MyInvocation.MyCommand.ScriptBlock.ToString()
        }
    
        process {
            # Get-Item $Path <= This will slow down the script
            $LiteralPath | Get-ChildItem -Directory | ForEach-Object -Parallel {
                $sb = $using:scriptblock
                $def = [scriptblock]::Create($sb)
                $_ # You can do this instead
                $_ | & $def
            }
        }
    }
    

    Looking back at this answer, what I would recommend today is to not use recursion and use a ConcurrentStack<T> instead, this would be miles more efficient and consume less memory. Also worth noting, as mklement0 pointed out in his comment, your code was correct to begin with, the issue was due to a typo: $_ | CustomFuction -> $_ | CustomFunction.

    function Test {
        [CmdletBinding()]
        param (
            [Parameter(
                Mandatory,
                ParameterSetName = 'LiteralPath',
                ValueFromPipelineByPropertyName,
                Position = 0)]
            [Alias('PSPath')]
            [string[]] $LiteralPath,
    
            [Parameter()]
            [ValidateRange(1, 64)]
            [int] $ThrottleLimit = 5
        )
    
        begin {
            $stack = [System.Collections.Concurrent.ConcurrentStack[System.IO.DirectoryInfo]]::new()
            $dir = $null
        }
    
        process {
            $stack.PushRange($LiteralPath)
            while ($stack.TryPop([ref] $dir)) {
                $dir | Get-ChildItem -Directory | ForEach-Object -Parallel {
                    $stack = $using:stack
                    $stack.Push($_)
                    $_
                } -ThrottleLimit $ThrottleLimit
            }
        }
    }