powershellcsvoptimizationreader

Check all lines in a huge CSV file in PowerShell


I want to work with a CSV file of more than 300,000 lines. I need to verify information line by line and then display it in a .txt file in the form of a table to see which file was missing for all servers. For example

Name,Server
File1,Server1
File2,Server1
File3,Server1
File1,Server2
File2,Server2
...
File345,Server76
File346,Server32

I want to display in table form this result which corresponds to the example above:

Name     Server1 Server2 ... Server 32 ....Server 76
File1       X       X
File2       X       X
File3       X       
...
File345                                       X
File346                          X

To do this actually, I have a function that creates objects where the members are the Server Name (The number of members object can change) and I use stream reader to split data (I have more than 2 columns in my csv so 0 is for the Server name and 5 for the file name)

$stream = [System.IO.StreamReader]::new($File)
$stream.ReadLine()  | Out-Null
while ((-not $stream.EndOfStream)) {
    $line = $stream.ReadLine()
    $strTempo = $null
    $strTempo = $line -split ","
    $index = $listOfFile.Name.IndexOf($strTempo[5])
    if ($index -ne -1) {   
        $property = $strTempo[0].Replace("-", "_")
        $listOfFile[$index].$property = "X"
    }
    else {
        $obj = CreateEmptyObject ($listOfConfiguration)
        $obj.Name = $strTempo[5]
        $listOfFile.Add($obj) | Out-Null
    }
}

When I export this I have a pretty good result. But the script take so much time (between 20min to 1hour)

I didn't know how optimize actually the script. I'm beginner to PowerShell.

Thanks for the futures tips


Solution

  • You might use HashSets for this:

    $Servers = [System.Collections.Generic.HashSet[String]]::New()
    $Files = @{}
    Import-Csv -Path $Path |ForEach-Object {
        $Null = $Servers.Add($_.Server)
        if ($Files.Contains($_.Name)) { $Null = $Files[$_.Name].Add($_.Server) }
        else { $Files[$_.Name] = [System.Collections.Generic.HashSet[String]]$_.Server }
    }
    $Table = foreach($Name in $Files.get_Keys()) {
        $Properties = [Ordered]@{ Name = $Name }
        ForEach ($Server in $Servers) {
            $Properties[$Server] = if ($Files[$Name].Contains($Server)) { 'X' }
        }
        [PSCustomObject]$Properties
    }
    $Table |Format-Table -Property @{ expression='*' }
    

    Note that in contrast to PowerShell's usual behavior, the .Net HashSet class is case-sensitive by default. To create an case-insensitive HashSet use the following constructor:

    [System.Collections.Generic.HashSet[String]]::New([StringComparer]::OrdinalIgnoreCase)