[SOLVED] How to maximize SSD I/O in C++?

How to maximize SSD I/O in C++?

I made a program to read and write 2d-array into NVME SSD(Samsung 970EVO plus).

I designed the program to read N*M like as

#pragma omp parallel for
for(int i=0;i<N;i++)
  fstream.read(...) // read M bytes

but, this code shows lower performance(KB/s) than SSD specification(< GB/s)

I think if size M is larger than block-size(maybe 4KB) and multiple of 2, that code will show GB/s performance.

However, it isn't. I think I missed something.

Are there some c++ codes for maximizing I/O performance on SSD?

Solution

No matter how much you tell fstream to read, it is likely to get done out of a fixed size streambuf buffer. The C++ standard does not specify its default size, but 4kb is fairly common. So passing a 4mb size to read() will very likely end up effectively reducing this to 1024 calls to read 4kb of data. This likely explains your observed performance. You're not reading a large chunk of data at once, but your application makes many calls to read smaller chunks of data.

The C++ standard does provide the means for resizing the size of the internal stream buffer, via the pubsetbuf method, and leaves it to each C++ implementation to specify exactly when and how to configure a stream buffer with a non-default size. Your C++ implementation may allow you to resize the stream buffer only before opening your std::ifstream, or it may not allow you to resize a std::ifstream's default stream buffer size at all; instead you must construct your custom stream buffer instance first, and then use rdbuf() to attach it to the std::ifstream. Consult your C++ library's documentation for more information.

Or, you may wish to consider using your operating system's native file input/output system calls, and bypass the stream buffer library altogether, which does add some overhead, too. It's likely that the contents of the file first get read into the stream buffer, then copied into your buffer you're passing here. Calling your native file input system calls will eliminate this redundant copy, squeeze a little bit more performance.