javascriptmongodbmongoosepromisewhen-js

Promises and upserting to database in bulk


I am currently parsing a list of js objects that are upserted to the db one by one, roughly like this with Node.js:

return promise.map(list,
    return parseItem(item)
        .then(upsertSingleItemToDB)
    ).then(all finished!)

The problem is that when the list sizes grew very big (~3000 items), parsing all the items in parallel is too memory heavy. It was really easy to add a concurrency limit with the promise library and not run out of memory that way(when/guard).

But I'd like to optimize the db upserts as well, since mongodb offers a bulkWrite function. Since parsing and bulk writing all the items at once is not possible, I would need to split the original object list in smaller sets that are parsed with promises in parallel and then the result array of that set would be passed to the promisified bulkWrite. And this would be repeated for the remaining sets if list items.

I'm having a hard time wrapping my head around how I can structure the smaller sets of promises so that I only do one set of parseSomeItems-BulkUpsertThem at time (something like Promise.all([set1Bulk][set2Bulk]), where set1Bulk is another array of parallel parser Promises?), any pseudo code help would be appreciated (but I'm using when if that makes a difference).


Solution

  • It can look something like this, if using mongoose and the underlying nodejs-mongodb-driver:

    const saveParsedItems = items => ItemCollection.collection.bulkWrite( // accessing underlying driver
       items.map(item => ({
          updateOne: {
               filter: {id: item.id}, // or any compound key that makes your items unique for upsertion
               upsert: true,
               update: {$set: item} // should be a key:value formatted object
          }
       }))
    );
    
    
    const parseAndSaveItems = (items, offset = 0, limit = 3000) => { // the algorithm for retrieving items in batches be anything you want, basically
      const itemSet = items.slice(offset, limit);
      
      return Promise.all(
        itemSet.map(parseItem) // parsing all your items first
      )
        .then(saveParsedItems)
        .then(() => {
          const newOffset = offset + limit;
          if (items.length >= newOffset) {
            return parseAndSaveItemsSet(items, newOffset, limit);
          }
          
          return true;
        });
    };
    
    return parseAndSaveItems(yourItems);