javascripthtmldomw3c

Why does the async loop `File.stream().getReader().read()` may block main thread?


<input type="file" id="el">
<code id="out"></code>
const el = document.getElementById('el');
const out = document.getElementById('out');
el.addEventListener('change', async () => {
  const file = el.files?.[0];
  if (file) {
    const reader = file.stream().getReader();
    out.innerText = JSON.stringify({ fileSize: file.size });
    let received = 0;
    while (true) {
      const chunk = await reader.read();
      if (chunk.done) break;

      // chunk.value.forEach((it) => it + 1);

      received += chunk.value.byteLength;
      out.innerText = JSON.stringify({
        fileSize: file.size,
        process: `${received} / ${file.size} (${((received / file.size) * 100).toFixed(2)}%)`,
      });
    }
  }
});

The code above works well, <code> will show progress in real-time. But if I add the line chunk.value.forEach((it) => it + 1);, the main thread seems to be blocked, the page stops responding until the file processing is completed. (Test in Edge 125)

I can use requestAnimationFrame to fix it. But why does it happen, is there a better way than requestAnimationFrame?

----edit
chunk.value.forEach((it) => it + 1); is a simplification of the real code. What I want to do is calculate the md5 of the file.

The browser seems to limit the size of each chunk to keep each loop around 16ms.The extra code hardly affects time and chunk size

while(true) {
  const start = Date.now();
  // ...
  console.log('chunk', Date.now() - start, chunk.value.byteLength);
}

// chunk 7 524288
// chunk 17 1572864
// chunk 25 2097152
// chunk 16 2097152
// chunk 18 2097152
// chunk 15 2097152
// chunk 16 2097152
// ...

Solution

  • I'm afraid this behavior is actually per specs, even if it makes for a terrible experience...

    The Pull steps of a ReadableStream controller can return already queued chunks synchronously. So if your callback in the read() reaction takes longer to execute than the filesystem takes to queue new chunks, the browser will never handle the control back to the event-loop and you end up in a microtask loop, which is very much UI blocking.

    Interestingly, Firefox does seem to queue a task somewhere, since they won't block the UI contrarily to Chrome. Maybe one could make a case at the specs level for it to be the standard. (Chrome exposes the blocking behavior for quite some time though, at least M84, so it might not be treated as such a big issue...).

    For the time being, to avoid it you could queue a task from the callback. To do so, the fastest would be to use the still experimental scheduler.postTask() method with a priority of "user-blocking",

    await scheduler.postTask(() => {}, { priority: "user-blocking" });
    

    and for browsers that still doesn't support this method you can monkey-patch it with a MessageChannel()

    {
      const { port1, port2 } = new MessageChannel();
      globalThis.scheduler ??= {
        postTask(cb, options) {
          return new Promise((resolve, reject) => {
            port1.addEventListener("message", () => {
              try { resolve(cb()); } catch(err) { reject(err); }
            }, { once: true });
            port2.postMessage("");
            port1.start();
          });
        }
      };
    }
    

    // monkey-patch scheduler.postTask(cb, { priority: "user-blocking" })
    {
      const { port1, port2 } = new MessageChannel();
      globalThis.scheduler ??= {
        postTask(cb, options) {
          return new Promise((resolve, reject) => {
            port1.addEventListener("message", () => {
              try { resolve(cb()); } catch(err) { reject(err); }
            }, { once: true });
            port2.postMessage("");
            port1.start();
          });
        }
      };
    }
    scheduler.postTask(() => {}).then(() => console.log("yep"))
    const el = document.getElementById('el');
    const out = document.getElementById('out');
    el.addEventListener('change', async () => {
      const file = el.files?.[0];
      if (file) {
        const reader = file.stream().getReader();
        out.innerText = JSON.stringify({ fileSize: file.size });
        let received = 0;
        while (true) {
          const chunk = await reader.read();
          if (chunk.done) break;
          // Lock for 100ms
                t1 = performance.now();
          while(performance.now() - t1 < 100) {}
          await scheduler.postTask(() => {}, { priority: "user-blocking" });
          received += chunk.value.byteLength;
          out.innerText = JSON.stringify({
            fileSize: file.size,
            process: `${received} / ${file.size} (${((received / file.size) * 100).toFixed(2)}%)`,
          });
        }
      }
    });
    <input type="file" id="el">
    <code id="out"></code>

    But the best in your case might actually be to use a Web Worker,

    and to send your file there, because even if by queuing a task in the loop we let the event-loop breath a bit, it will still struggle to handle all the UI updates with so little time available. Note that sending a File object (or a Blob) to a worker context doesn't copy the data, so you don't need to worry about memory usage for this.


    Ps: I opened BUG 355256389, let's see how it goes.