node.jsperformanceasynchronousasync-awaitscale

Prevent blocking the event loop on sync operations


The company I work for has a micro service which handles 300 requests per second, among 30 NodeJS pods.

The metrics in DataDog showed high latency and cpu usage today, when there was a peak in requests.

I was watching the DataDog APM profiles and traces for different pods, The function below seemed to be taking a lot of time to execute - 2.43 seconds wall time, 43% of a pod's profile. The pods memory on peak time are around 600mi

await asyncFunc1(items);
const asyncFunc1 = async (items: SomeInterface[]) => {
  const response = await Promise.all(
    items.map(async (item) => {
      const response2 = await asyncFunc2(item);
      return response2;
    })
  );

  return response;
};

const asyncFunc2 = async (item: SomeInterface) => {
  const key = item.key;
  const inMemoryData = getInMemoryDataSync(key); // get cached data in memory - 99% data is cached.
  if (inMemoryData) {
    return JSON.parse(inMemoryData);
  }

  const redisData = await redisRepository.getData(key);
  if (redisData) {
    addInMemoryDataSync(redisData); // add cached data in memory
    return JSON.parse(redisData);
  }

  return null;
};

That code is using an await operation when calling asyncFunc2.

The first part of asyncFunc2 is sync - getting in-memory data. 99% of requests will find the cached data in-memory, and won't fetch it from Redis. That means an unnecessary await is applied, for a not async operation.

What I found out on google is, that this won't make the sync operation to be async, but it will create a micro task because of the await. The micro task will execute before the next cycle of the event loop. So if there are 300 reqs per seconds, 300 micro tasks will be prioritized over the event loop tasks.

I thought about something like this -

const preventMicroTasksInSyncOperations = async (items: SomeInterface[]) => {
  const response = [];
  const keysToFetch = [];

  items.forEach((item) => {
    const key = item.key;
    const inMemoryData = getInMemoryDataSync(key); // get cached data in memory - 99% data is cached.
    if (inMemoryData) {
      response.push(JSON.parse(inMemoryData));
    } else {
      keysToFetch.push(key);
    }
  });

  await Promise.all(
    keysToFetch.map(async (key) => {
      const redisData = await redisRepository.getData(key);
      if (redisData) {
        addInMemoryDataSync(redisData); // add cached data in memory
        response.push(JSON.parse(redisData));
      }
    })
  );

  return response;
};

await preventMicroTasksInSyncOperations(items);

what do you think about that version of preventMicroTasksInSyncOperations ? That way, all the sync operations will not create micro tasks, and get executed one by one.

Thanks for any help.


Solution

  • Solved the high latency and CPU issues, I stop using JSON.stringify() and JSON.parse() That is an anti-pattern for NodeJS, which is single-threaded.

    Now the event loop isn't blocked, and I have reduced 75% of the live pods amount in production- from 30 live pods to 8 live pods, and 55ms requests turned into 15ms.

    await asyncFunc1(items);
    const asyncFunc1 = async (items: SomeInterface[]) => {
      const response = await Promise.all(
        items.map(async (item) => {
          const response2 = await asyncFunc2(item);
          return response2;
        })
      );
    
      return response;
    };
    
    const asyncFunc2 = async (item: SomeInterface) => {
      const key = item.key;
      const inMemoryData = getInMemoryDataSync(key); 
      if (inMemoryData) {
        return inMemoryData; // already parsed
      }
    
      const redisData = await redisRepository.getData(key);
      if (redisData) {
        const parsed = JSON.parse(redisData);
        addInMemoryDataSync(parsed); // add parsed data to in memory
        return parsed;
      }
    
      return null;
    };
    

    Redis is refreshed every 1 hour, so all calls will use the parsed object directly from in memory cache for at least 1 hour