Performance degrade on JSObject after 2^23 items

I'm having trouble understanding the code below. Its performance worsens significantly after 2^23 (8,388,608) items. I was thinking it might have something to do with how JSObject is implemented, but it turns out that if I change the string keys to some sort of hashes, the performance is actually pretty good.

I want to understand how V8 works under the hood with this example:

function foo() {
    const result = Object.create(null);
    for (let i = 0; i < 25_000_000; i += 1) {
        console.time(`Prop ${i}`);
        result[`prop_${i}`] = i + (Math.random() * 25_000_000);
        console.timeEnd(`Prop ${i}`);
    }
    return result;
}


const a = foo();

%DebugPrint(a);

console.log(a[`prop_24000000`]);
console.log(Object.keys(a).length);

To run it

$ node --allow-natives-syntax file.js

The recorded time is around 0.001 ms before 2^23 items, but after that, it increases to approximately 1.5 seconds per iteration.

I also want to profile it to see if the theories are correct. If you have any theories, please let me know how I can profile it.

Thanks in advance!

Solution

(V8 developer here.)

V8 represents large objects as dictionaries under the hood. (That's because its hidden-class system isn't meant for huge objects, and wouldn't work well for them, so has an internal limit of tracking up to 1020 properties, slightly less than 2**10.)

JavaScript as a language requires object properties to be iterated in insertion order, so when an engine chooses to use a dictionary representation for an object, it must store the insertion/enumeration index with each entry (because dictionaries, by nature, arbitrarily reshuffle their entries internally). V8 stores the dictionary's next insertion index (that the next entry will get) in a bit field that happens to be 23 bits wide (out of a 32-bit wide slot; the other bits are used for other stuff). When that's exceeded, the next enumeration index needs to be computed every time by walking the entire dictionary. That's pretty slow (which is why it's normally avoided with the cached "next index").

This is clearly unfortunate, but it's also assumed to be very rare in practice, as most objects don't get that big. I don't know how feasible it would be to do something about this performance cliff.
In the meantime, your best bet is probably to use a Map, which scales better to large sizes. If you need even more entries than a Map can hold, consider a sharded system, i.e. multiple smaller Maps (or objects) each being responsible for part of your index space.

If you insist on profiling this yourself, you can use the d8 shell's --prof option with a suitable test case, and then your platform's "tick processor" script (e.g. for Linux). You'll see certain C++ functions high up on the profile that have to do with Runtime_SetKeyedProperty and EnumIndexComparator. You can then use code search to understand the mechanisms that are at work under the hood, in particular BaseNameDictionary::NextEnumerationIndex.