javascriptnode.jscharacter-encodingchunked-encodingnode-streams

How to read data of a NodeJS stream without it being lost?


I need to know the encoding of a node stream for which I am using detect-character-encoding module. But the problem is that I can only read encodings of a buffer and not a stream due to which I have to do something like this:

FileStream.on('data', (chunk) => {
  console.log(chunk)
  const charsetMatch = detectCharacterEncoding(chunk)
  console.log(charsetMatch)
})

Knowing stream encoding comes at the cost of losing a chunk of data, which is required later in the code flow. Is there a way possible in which I can just peek at chunk know its encoding and not lose the chunk and data?


Solution

  • You can build a promise to return both the contents and the charset of the stream:

    const charsetStream = (stream) => new Promise((resolve, reject) => {
    
      const detectCharacterEncoding = require('detect-character-encoding');
      let chunks = [];
    
      stream.on('data', (chunk) => {
        chunks.push(chunk);
      })
    
      stream.on('end', () => {
        chunks = Buffer.concat(chunks);
        resolve({
          content: chunks,
          charset: detectCharacterEncoding(chunks)
        })
      })
    
      stream.on('error', (err) => {
        reject(err);
      })
    
    });
    
    charsetStream(FileStream)
      .then(info => {
        console.log('content', info.content);
        console.log('charset', info.charset);
      })
      .catch(console.log);
      
      // You can use the FileStream outside the method but you can use it once !
      // this is completely different than the "stream" variable
      FileStream.on('data', (chunk) => {
        console.log('FileStream', chunk.toString());
      })