vb.netparallel-processingparallel-foreach

Parallel.ForEarch - iterating through files in folders/subfolders, some files are missing/skipped


Trying to get the CRC32 of all files in folder/subfolders and write the result in a simple text box, but in the process some files are skipped using the *Parallel.ForEach *loop.

The code:

Imports System.IO
Imports Force.Crc32

Class Form1

    Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        Dim myPATH As String = "E:\Temp" 'top path to be scanned
        Dim counter As Integer=0 'count of processed files
        Dim outputTEXT As String = "" 'entire output as text
        Dim stopwatch As New Stopwatch()

        'get list of all files
        Dim allFILES As String() = Directory.GetFiles(myPATH, "*.*", SearchOption.AllDirectories)

        stopwatch.Start()
        ' Use parallel processing to compare CRC32 values
        Parallel.ForEach(allFILES, Sub(myfile)
                                       counter += 1
                                       ' Calculate CRC32 for the file
                                       Dim crc32Value As String = CalculateCRC32(myfile)
                                       outputTEXT += crc32Value & "," & myfile & vbCrLf
                                   End Sub)
        stopwatch.Stop()

        Console.WriteLine("files processed: " + counter.ToString + " |   time :" + stopwatch.Elapsed.ToString)

        txtISPIS.Text = outputTEXT 'populate txtbox with result

    End Sub

After processing folder with subfolders, containing 44.035 image files (.jpg), some files are missing from the result. And the count of missing files varies on different runs.

Six consecutive runs gave:
files processed: 43694 |   time :00:00:06.6360416
files processed: 43784 |   time :00:00:07.1502587 
files processed: 43822 |   time :00:00:05.6360439 
files processed: 43739 |   time :00:00:05.4827714 
files processed: 43744 |   time :00:00:06.5791734 
files processed: 43746 |   time :00:00:06.9342391

Using the normal For Each loop, works as is should and all 44.035 files are processed, but it's 10x slower, so the parallel option is much more interesting... :)

So, what I'm doing wrong?

Tnx


Solution

  • Give this a try. The order of the output will be random, the count should be fixed.

        Dim myPATH As String = "E:\Temp" 'top path to be scanned
        Dim counter As Integer = 0 'count of processed files
        Dim outputTEXT As New Concurrent.BlockingCollection(Of String) '<<<<<<<<<<<<<
        Dim stopwatch As New Stopwatch()
    
        'get list of all files
        Dim allFILES() As String = IO.Directory.GetFiles(myPATH, "*.*", IO.SearchOption.AllDirectories)
    
        stopwatch.Start()
        ' Use parallel processing to compare CRC32 values
        Parallel.ForEach(allFILES, Sub(myfile As String)
                                       Threading.Interlocked.Increment(counter) '<<<<<<<<<<<<<
                                       ' Calculate CRC32 for the file
                                       Dim crc32Value As String = CalculateCRC32(myfile)
                                       outputTEXT.Add(String.Format("{0}, {1}", crc32Value, myfile)) ' '<<<<<<<<<<<<<
                                   End Sub)
    
        stopwatch.Stop()
        Console.WriteLine("files processed: " + counter.ToString + " |   time :" + stopwatch.Elapsed.ToString)
        txtISPIS.Text = String.Join(ControlChars.Cr, outputTEXT.ToArray)