I'm trying to optimizie an I/O bound C++ Win32 application. What it actually does is something very similar to recurse a folder and compute a cryptographic hash of every file it finds. It's a single threading application using memory mapped files, so as it's easy to imagine it doesn't seem to use much CPU since most of the times the main thread is put to sleep waiting for I/O to complete. I'm thinking about a couple of solutions, but I'm not sure about it so I'd like to have your opinions.
Any other idea/suggestion/etc would be really appreciated :)
Thanks.
Before parallelizing anything, always ask yourself first: Does the added complexity justify the performance gained? In order to answer that question with minimal effort, just test how much % of max read throughput you already have. That is, test your current read throughput and then test max throughput. Don't use the theoretical max for this computation. Then, think about how much complexity and how many possible issues are introduced in even the simplest approach to gain the last few %.
As already mentioned in the comments, the greatest performance gain here is probably achieved by pipelining (i.e. overlapping computation and I/O). And the easiest way to implement that is with asynchronous reads. This thread lists multiple ways to implement asnychronous file I/O in C++.
If you don't need portability, just use the Windows OVERLAPPED API. Boost ASIO does not seem to make File I/O very easy (yet). I could not find any good examples.
Note that, depending on your system configuration, you have to launch multiple threads to fully saturate I/O bandwidth (especially if files of that folder actually reside on multiple disks, which is possible). Even if you only read from one device, you might fair (slightly) better with multiple threads to mitigate OS overhead.