javascriptnode.jscsvstreamfast-csv

Reading from multiple CSV files and writing into one using streams


My program takes in CSV files and attempts to merge them into one CSV file. All of the CSV files will have the same columns.

I am using the fast-csv package to parse and format the rows of the CSV files, but I am having trouble getting them to all go into one file consecutively.

I am looping through the files and running the function that parses and formats the rows, but the output file is all disordered and does not contain all of the data from the files.

I think it’s due to the synchronous nature of the ‘for’ loop that I am using to loop through the different CSV file arguments, and the asynchronous nature of reading from different streams and attempting to write to a single stream.

I am looking for some guidance on how I might loop through each file argument, but before going on to the next file - the parsing, the formatting, and writing to the output file completes for that file.

// Function that parses and formats the given file
function formatFile(paths, index) {
  // Initialize format options
  let formatOptions = {
    quoteHeaders: true,
    quoteColumns: true,
    escape: '\\'
  };

  // If the current file is the first file, write the headers of the file
  if (index === 0) {
    formatOptions.headers = true;
  // If the current file is not the first file, do not write the headers of the file
  } else {
    formatOptions.writeHeaders = false;
  }

  // Take in the current file as a readable stream
  fs.createReadStream(paths[index])
    // Parse the CSV file
    .pipe(csv.parse({ headers: true, escape: '\\' }))
    // Format the rows of the CSV file
    .pipe(csv.format(formatOptions))
    // Pipe the formatted data into the output CSV file
    .pipe(outputFile);
}

// Loop through each CSV file argument and run the formatFile function on each
for (let i = 0; i < filePaths.length; i++) {
  formatFile(filePaths, i);
}

Solution

  • Use promises.

    // Function that parses and formats the given file
    function formatFile(paths, index) {
      // Initialize format options
      let formatOptions = {
        quoteHeaders: true,
        quoteColumns: true,
        escape: '\\'
      };
    
      // If the current file is the first file, write the headers of the file
      if (index === 0) {
        formatOptions.headers = true;
      // If the current file is not the first file, do not write the headers of the file
      } else {
        formatOptions.writeHeaders = false;
      }
    
      // Take in the current file as a readable stream
      const stream = fs.createReadStream(paths[index])
        // Parse the CSV file
        .pipe(csv.parse({ headers: true, escape: '\\' }))
        // Format the rows of the CSV file
        .pipe(csv.format(formatOptions))
        // Pipe the formatted data into the output CSV file
        .pipe(outputFile);
    
       return new Promise(resolve => stream.on('finish', resolve))
    }
    
    // Loop through each CSV file argument and run the formatFile function on each
    for (let i = 0; i < filePaths.length; i++) {
      await formatFile(filePaths, i);
    }
    

    The function that contains the for loop must be an async function.