javascriptmemoryv8typed-arrays

Memory overhead of typed arrays vs strings


I am trying to reduce the memory usage of a JavaScript web application that stores a lot of information in memory in the form of a large number of small strings. When I changed the code to use Uint8Array instead of String, I noticed that memory usage went up.

For example, consider the following code that creates many small strings:

// (1000000 strings) x (10 characters)
var a=[];
for (let i=0; i<1000000; i++)
    a.push("a".repeat(10).toUpperCase());

If you put it in an empty page and let the memory usage settle for a few seconds, it settles at 70 MiB on Google Chrome. On the other hand, the following code:

// (1000000 arrays) x (10 bytes)
var a=[];
for (let i=0; i<1000000; i++)
    a.push(new Uint8Array(10));

uses 233 MiB of memory. An empty page without any code uses about 20 MiB. On the other hand, if I create a small number of large strings/arrays, the difference becomes smaller and in the case of a single string/array with 10000000 characters/entries, the memory usage is virtually identical.

So why do typed arrays have such a large memory overhead?


Solution

  • V8 developer here. Your conclusion makes sense: If you compare characters in a string to elements in a Uint8Array, the string will have less overhead. TypedArrays are great at providing fast access to typed elements; however having a large number of small TypedArrays is not memory efficient.

    The difference is in the object header size for strings and typed arrays.

    For a string, the object header is:

    1. hidden class pointer
    2. hash
    3. length
    4. payload

    where the payload is rounded up to pointer size alignment, so 16 bytes in this case.

    For a Uint8Array, you need the following:

    1. hidden class pointer

    2. properties pointer (unused)

    3. elements pointer (see below)

    4. array buffer pointer (see below)

    5. offset into array buffer

    6. byte length

    7. length of view into array buffer

    8. length (user-visible)

    9. embedder field #1

    10. embedder field #2

    11. array buffer: hidden class pointer

    12. array buffer: properties pointer (unused)

    13. array buffer: elements pointer (see below)

    14. array buffer: byte length

    15. array buffer: backing store

    16. array buffer: allocation base

    17. array buffer: allocation length

    18. array buffer: bit field (internal flags)

    19. array buffer: embedder field #1

    20. array buffer: embedder field #2

    21. elements object: hidden class pointer

    22. elements object: length (of the backing store)

    23. elements object: base pointer (of the backing store)

    24. elements object: offset to data start

    25. elements object: payload

    where, again, the payload is rounded up to pointer size alignment, so consumes 16 bytes here.

    In summary, each string consumes 5*8 = 40 bytes, each typed array consumes 26*8 = 208 bytes. That does seem like a lot of overhead; the reason is due to the various flexible options that TypedArrays provide (they can be overlapping views into ArrayBuffers, which can be allocated directly from JavaScript, or shared with WebGL and whatnot, etc).

    (It's not about "optimizing memory allocation" nor being "better at garbage collecting strings" -- since you're holding on to all the objects, GC does not play a role.)