node.jsnode-streamswritable

Node.js Writable stream: write vs _write


I'm reading official Node.js docs to understand streams. I'm implementing Writable stream, but I can't understand what is the difference between write and _write.

Quoting the docs from this section:

All calls to writable.write() that occur between the time writable._write() is called and the callback is called will cause the written data to be buffered. When the callback is invoked, the stream might emit a 'drain' event. If a stream implementation is capable of processing multiple chunks of data at once, the writable._writev() method should be implemented.

this only gives me the idea that both behave differently, but I can't understand how.

May be by taking an example based on here in the docs, can anyone explain what is the difference in the ways the code snippets given below will behave on receiving data from Readable/Transform stream? What are write and _write counterparts in Readable stream, if any?

const { Writable } = require('stream');

const myWritable = new Writable({
  write(chunk, encoding, callback) {
    if (chunk.toString().indexOf('a') >= 0) {
      callback(new Error('chunk is invalid'));
    } else {
      callback();
    }
  }
});
const { Writable } = require('stream');

class MyWritable extends Writable {
  _write(chunk, encoding, callback) {
    if (chunk.toString().indexOf('a') >= 0) {
      callback(new Error('chunk is invalid'));
    } else {
      callback();
    }
  }
}

Solution

  • .write() is what the consumer or user of a writable stream calls to write data to the stream object. This is a public interface expected to be used by anyone using a writable stream.

    ._write() is an internal interface. It should NOT be called by the consumer or user of a writable stream. It is supplied by the implementer of the specific type of stream object and it will be called by the stream itself whenever data needs to be written to whatever storage is actually behind this stream. It's part of the stream abstraction for the underlying storage. For example, if you implemented a stream object that represents writing to a serial port, as the implementer of that stream object, you would have to override the generic _write() and supply an implementation of ._write() that physically send bytes to the serial port whenever the stream infrastructure happens to call _write().

    Note, there is not a 1-to-1 correspondence between when the consumer of the stream object calls .write() and when the stream infrastructure then calls ._write() because of buffering within the stream object.

    This design also allows you to have one common stream interface that can have thousands of different actual storage mechanisms beneath it where supplying a ._write() method is one part of implementing a storage interface in order to offer a generic stream interface to that storage.


    And, in your first code example, supplying a write property in the option object to the stream constructor is actually providing an implementation for _write(). The constructor will take that function and make it the _write() method. Yes, this is confusing.

    In your second example, you're overriding a method directly so you have to override the _write() method because that's the one that implementors must provide.