In the documentation for java.util.RandomAccess 1 I see this note:
For example, some List implementations provide asymptotically linear access times if they get huge, but constant access times in practice.
What could such a data-structure look like? Clearly, the authors of this doc had something in mind.
The best thing I can think of would be to make a linked list of arrays, as discussed in this question. I wonder if this is a real-world option and if there are others (the question I linked even called it a 'half-measure')
Bonus points:
[1] A marker interface to distinguish list-implementations with fast random access. An array-backed implementation would implement it, a linked list would not. But this is not a java-question.
Yes, a linked-list of arrays is an example of this, where the arrays have a maximum size. Such a structure will work well when the the number of elements to be stored is of the same order as -- or less order than -- this maximum size, i.e. as long as the linked-list only needs up to a hand full of nodes, a lookup will be a constant factor. Whether you implement it as a linked list of arrays, or a top-level array of arrays becomes an implementation detail, as in either case you'll need to traverse the top-level structure to find the right bottom-level array (based on the accumulated sizes of these arrays). Note that with a top-level array implementation you could use a binary search technique to more efficiently arrive at the "bucket" (lower-level array) you need.
This structure is like a restricted form of B+ tree, where you limit the depth to two and use a high branching factor that can be extended if the collection becomes huge). There is also some similarity with skip lists (again restricted to 2 levels). It is potential candidate for implementing sorted containers, where you can access the kth element by its sorted order efficiently.
A popular implementation can be found in the Python world: sorted containers:
Sorted Containers uses a segmented-list data structure similar to a B-tree limited to two levels of nodes. As part of the implementation, a load factor is used to determine how many values should be stored in each node.
The greater this "load factor", the greater the size of the collection must be to get into the linear complexity area. Similarly, the greater this load factor, the less access time increases with size for the smaller collection sizes. The author of the above library provides this graph:
Here we see that happen: the greater load factor "delays" the linear behaviour to the area of the greater collection sizes.