How does cache locality impact the performance of ArrayList compared to LinkedList in Java?
I've often heard that ArrayList has an advantage in terms of cache locality, but I don't fully understand why. Since Java stores objects in memory as references, wouldn't accessing elements in either list require jumping to random locations in memory?
Hold on, let's first illustrate how java works, because without that knowledge you can't make heads or tails of these conflicting statements about 'everything is a link'.
In java, every variable is fixed size and very small. This is highly convenient, because it allows making such grand, sweeping statements such as "whenever you call a method, every parameter is copied to the method you are calling; that method can do whatever it wants with these copies, and the copies poof out of existence once that method has finished execution". After all, if you make that statement and a variable is 2 GB large, then passing it around will very very swiftly cause out of memory errors.
But, how does that work? Surely List<String> list = enumerateEveryWordInTheCollectedWorksOfShakespeare();
is not "fixed, small size".
That's where the second part kicks in: In java, you have primitives, which is this hardcoded list of types: int
, long
, short
, byte
, double
, float
, char
, boolean
. (That's been the list since forever, this list will never change, you cannot make your own primitives ^1) - each and every one of those is 'fixed, small size' (specifically, they are all 64 bits or less). And everything else is a reference. A pointer.
When you write:
String x = "hello";
Then it is incorrect to say "x is hello". It's not. x is a reference to it.
It's like "Hello" is a house (think about it: Strings can be arbitrarily long. They can be the entire collected works of shakespeare. Very much not 'fixed, small size'), and x is merely a page in an address book. It's an instruction that tells you how to get to the house. Given that page, if you wanna know if the house is red, you can do it - just.. go there, and look. All you need is the page. But you do need to spend the time, so to speak.
ArrayList
ArrayList is 'a list of links' in that exact sense: By definition, you can't have an arraylist of primitive values (you can have a List<Integer>
; Integer
is the boxed, i.e. 'reference' version of int
. List<int>
is not valid java), hence, it must be a list of references. That means an ArrayList
is an address book. It's a list of addresses.
The list is 'contiguous' - meaning, the address book is in one piece, and in the order you expect it to be. If you're on page 5 of the address book, and you're intrigued about the address on page 6, then I can guarantee you page 6 is extremely closeby. Simply.. flip the page, and there it is. Guaranteed.
One downside of arraylists is that, like actual address books, they have a fixed size. So what happens when you fill up your address book? Well, the implementation of ArrayList will play some secret magic to make it appear as if the list doesn't actually have a 'whoops it is full' issue: It buys a new, larger address book, manually copies over each and every address in the old one, swiftly replaces your address book with this new one, and chucks the old one in the garbage, all just as part of its normal operations. Calling methods isn't so much 'doing' a thing to an object, it's asking the object to do it for you. ArrayList is an address book that is capable of understanding how to go out, buy a new larger one, copy itself into the new one, transfer its consciousness into the new one and then chuck its old self in the bin.
LinkedList
LinkedList is like an address book that you rip each page out of, and then just scatter them all across the room. You know where page 1 of the address book is. But that is all you know. Fortunately, each and every page lists the location where you hid the next page, as well as the previous page. So, if you want to, say, go to the house listed in your address book on page 5, because you want to know what colour it is painted, you find page 1, and it says "You shoved page 2 behind the couch", you go find page 2, which says "page 3 is on top of the refridgerator", you find page 3, which says "page 4 is on the desk", and page 4 says "page 5 is also on the desk", and then finally you can go over to that house.
This is a ridiculously time consuming process. It does have one single advantage: You never need to buy a new address book and copy the old full one over to it. After all, with this scattered pages system, if you run out of pages, just scatter some more blank pages across the room and keep going. This isn't much of an advantage, but it is something, I guess.
Now, if you so happen to create a linked list and immediately fill the first 500 pages of it, then the situation is likely that all 500 pages are still in a stack on your desk more or less sorted exactly. But because you can't be sure, even if this is the case, if you want to go to the house listed on page 250, you still have to go through each page in turn, whereas with the arraylist address book you can just go right to page 250 in one go.
But LinkedList does not require that you fill it up neatly as you make it. If you fill it up over time, then you get the 'pages are all over the place' scenario.
Each 'page' of our linkedlist address book actually stores 3 things. Not one thing. It stores the location of the previous page, the location of the next page, and, of course, the address of the house. Each of those things is a 'fixed small thing' (a reference). But those 3 together - that's not how java works, that's too much.
So in actual fact a LinkedList is more like 'each page of the address book is actually a tiny little address book with 3 pages in it, one explaining where the previous postit is, one where the next postit is, and one with the address of the house', and those mini-address books are all over the place, as well as a bunch of postits that explain where to find the mini address books.
A LinkedList consists of a reference to the first node and the last node. A node is a small object that consists of a reference to the next node, to the previous node, and to the object this entry in the list represents. To e.g. iterate through the first 10 items of a linked list and e.g. print them all, the JVM has to resolve the reference to the linked list, from there resolve the reference to the first node, from there resolve the reference to the object and print that, then resolve the reference to the next node, resolve the reference to the object and print that, resolve the reference to the next node, and so on.
Each little node object is created as you add an object, and unless you so happen to all do that neatly all in one go, these node objects are scattered all over your heap. And of course, the actual objects that the list contains may or may not be scattered all over the heap, depends on when they were made.
In contrast, with an ArrayList
, you simply have a guaranteed consecutive series of references to the actual objects contained in the list. Those objects may not be consecutive, but at least the refs to them are.
It boils down to just 2 words, which are all you must know as a java programmer about linked lists.
That's it. That's all. The number of cases where an LL is the correct answer is vanishingly small. ArrayList is often better, but certainly not always; however, some other collection variant will be better than LL in pretty much every imaginable use case.
Various language agnostic takes on LL make claims that might apply to a LinkedList but they do not apply to java's take on them. For example, 'the advantage of a linked list is that, given an object in the list, it is very fast to insert an object right after it'. This is false - given an entry in a linked list you can't "get back" to the linked list's node in java. In essence the one and only way to even begin to enjoy the benefits of LLs in java is to use the very rarely used .listIterator()
method which indeed can do a quick insertion of a value in a way ArrayList cannot compete with. Without listIterator the only thing LL can do that AL cannot do, is 'fast add/remove both from start and from end.
But if that's what you want, use ArrayDeque
- much more efficient at the job of 'a list thing that lets you fast add/remove from either end'.
[1] Various aspects of valhalla and panama mean these statements are going to need a lot of caveats soon. As of JDK24, this is accurate. If you know what Project Valhalla and Panama are - yes, change is on the horizon.