javascripthtmlfilereader

How do I read the first N bytes of a file from an HTML File input?


Situation

Say the user may upload a file on a web page, that usually is of a big size (minimum 80 MB, can be way more), and of a specific type, say, for example, PDF.

Considering these are huge files, we wouldn't want to waste bandwidth unnecessarely uploading the file only to realize the file's type is wrong. Therefore, we'd want to make sure, on the client side, that the file is indeed a PDF file, and only THEN send it if it indeed it.

Fortunately, the PDF file format has a 5 bytes Magic number, equal to 25 50 44 46 2D.

(It is an example, it could be any file format, I'm using PDF as a reference. What matters is that it is a file format you can differentiate with its magic bytes, which we consider a good enough verification here. Besides, my question could be relevant to other cases, not just this file format example, please consider the PDF example solely as a way to give one practical example about the problem)

Hence my question: How would I read the 5 first bytes of the file, or more generally, the first N bytes of a file?

You wouldn't want to read the full file, since it can be huge and the client's hard drive might be slow, you really only need to read those five bytes, and only if they are correct, you will read the rest of the file to send it to the server.

If there isn't a way, is there any workarounds or ongoing proposals for such a feature?

What I've tried

The FileReader API allows to read a file into an array buffer (see this answer and the docs):

let reader = new FileReader();
  
reader.onload = function() {
  let arrayBuffer = this.result,
    array = new Uint8Array(arrayBuffer),
    binaryString = String.fromCharCode.apply(null, array);

  console.log(binaryString);
 }
reader.readAsArrayBuffer(this.files[0]);

This however reads the whole file.

Similar questions that do not give a solution to my question

Comments

(Responding to significant comments here, since comments are meant to be temporary)

Does slicing of the actual file itself help? https://stackoverflow.com/a/24845020/1427878 – @C3roe

It gives me the expected result, but what's the guarantee it does indeed only read the first n bytes and doesn't just read it all and then slice it? Is there any implementation details for this in standards? MDN states: "a new Blob object which contains data from a subset of the blob on which it's called.", implying there was a full blob to slice from in the first place.


Solution

  • In client-side Javascript, a File object represents a file on the local file system. It does not immediately read the file into the browser's memory. Reading would be an asynchronous operation whereas a File can be constructed synchronously.

    File is a subclass of Blob whose slice method produces another Blob (again, synchronously) that represents a subset of contiguous bytes from the file (again, without reading them).

    Actually reading the file's or slice's contents requires the invocation of the asynchronous text method (or bytes or arrayBuffer methods), or the use of a ReadableStream obtained via the stream method. All these methods introduce asynchronousness.

    One can therefore use small slices of a file

    async function sniff(file) {
      console.log(await file.slice(0, 5).text());
    }
    <input type="file" onchange="sniff(this.files[0])">