I'm trying to process a bunch of HTML files (several thousand, each with a size of around 500KB) with a C++ executable, but I'm experiencing very poor performance when reading the files sequentially. The core C++ code is basically this:
int main() {
std::vector<std::string> filePaths{ "file1", "file2", ... }; // Many file paths
int i = 0;
for (const std::string& filePath : filePaths) {
std::ifstream file(filePath, std::ios::binary);
std::cout << i++ << std::endl;
}
return 0;
}
I compiled this with MSVC 2022 as well as MinGW (GCC 13.1.0), both with standard CMake Release optimizations on. However the runtime performance on my quite modern machine (Win11, 32GB RAM, Ryzen 9, SSD) is far beneath of what I would have expected:
Extracting the std::ifstream file
object out of the loop and using file.open
/file.close
in the loop or even using C-style fopen
/fclose
instead of std::ifstream
makes no difference.
Am I missing something? I can't believe that it takes several minutes to just open (not even process yet) a few thousand small files...
Edit: I ran the same binary on another Windows machine (also with Win11 and SSD) and got the same slow results.
Compiling and running the same code on a Mac Mini however gave the expected results: it finished within a few milliseconds.
I think I finally found the reason: it was indeed Windows Defender (on both Windows machines) that slowed down the processing. As I already mentioned in a comment yesterday, I have added my executable to the Defender exclusion list, but that didn't change anything. However, I had to add the files directory to the exclusion list. The files were mostly HTML files, so I guess Defender did an extra check there when opening them?
This leaves room for another question: What steps would be necessary to make sure that someone else who uses the executable on their system to process HTML files wouldn't need to disable Antivirus software? Could signing the executable help?