I am trying to replace an ETL process using NodeJS streams. The Transform stream I am attempting to write takes in a dataset, and based on configuration data, will output one or more records per record inputted. In other words, if it's reading 100000 records, the transformation can end up writing anywhere from 100000-400000 records. The _transform
method only allows its callback to be called once, so I am trying to figure out how to output multiple objects per single input object.
I looked at duplexes but every example I saw was using it as a two way flow, whereas I definitely want my stream to be one way (or i may just not understand how they work). Anyone have any suggestions on how to implement this?
The callback can only be called once, but the .push
method is what emits data, and can be called as many times as necessary in the _transform
method. Example:
class MyTransform extends Transform {
_transform(chunk, enc, next) {
const arrayFromChunk = chunk.split(',');
arrayFromChunk.forEach(piece => {
// this.push is what will emit readable data, can be called as often
// as needed.
this.push(piece);
});
next(); // next can only be called once.
}
}
docs here: https://nodejs.org/docs/latest-v18.x/api/stream.html#stream_implementing_a_transform_stream