parallel-processingfilesystemssystem

It is reasonable to read data on disk parallelly?


In an application, one may need to read the data/files on disk and load them into memory. Many programming languages have support to use multiple CPUs to do the work. I am wondering whether it is a reasonable option to read the disk parallelly. The parallel/concurrent routines will harm the disk, right? Could you please provide some advice on how to design this kind of system? Thanks in advance.


Solution

  • If you are after performance, then reading data in parallel is the best thing you can do. The more requests you can provide a disk the faster it can complete the aggregate set of operations.

    The only problem with reading data concurrently is that you need to be able to handle it correctly in your application. Typically this means using threads, although you can find OS specific solution that may help with this, such as AIO on linux.

    Lastly, the term reasonable is somewhat loaded. While it may be faster to read data concurrently, is there a good use case/does it improve the user experience/is it worth the extra code complexity? In most cases, the answer to that would be no.