databasecephrocksdb

How could WAL (write ahead log) have better performance than write directly to disk?


The WAL (Write-Ahead Log) technology has been used in many systems.

The mechanism of a WAL is that when a client writes data, the system does two things:

  1. Write a log to disk and return to the client
  2. Write the data to disk, cache or memory asynchronously

There are two benefits:

  1. If some exception occurs (i.e. power loss) we can recover the data from the log.
  2. The performance is good because we write data asynchronously and can batch operations

Why not just write the data into disk directly? You make every write directly to disk. On success, you tell client success, if the write failed you return a failed response or timeout.

In this way, you still have those two benefits.

  1. You do not need to recover anything in case of power off. Because every success response returned to client means data really on disk.
  2. Performance should be the same. Although we touch disk frequently, but WAL is the same too (Every success write for WAL means it is success on disk)

So what is the advantage of using a WAL?


Solution

  • Performance.

    This was much more important when spinning disks were the standard technology because seek times and rotational latency were a bit issue. This is the physical process of getting the right part of the disk under the read/write head. With SSDs those considerations are not so important, but avoiding some writes, and large sequential writes still help.

    Update:

    SSDs also have better performance with large sequential writes but for different reasons. It is not as simple as saying "no seek time or rotational latency therefore just randomly write". For example, writing large blocks into space the SSD knows is "free" (eg. via the TRIM command to the drive) is better than read-modify-write, where the drive also needs to manage wear levelling and potentially mapping updates into different internal block sizes.