I am trying my best to understand how ConcurrentHashMap works under the hood.
It seems that during resizes there is an entire encoding scheme happening within sizeCtl
variable.
Some speculations are saying that the lower 16 bits
signify the number of threads, other speculations specify that there is a point system counter implemented, ie. +1
when a thread is doing the resizing and -1
for when a thread is leaving the resizing.
https://stackoverflow.com/a/52668122/7134737
https://stackoverflow.com/a/53477058/7134737
Can someone explain in plain terms what the following variables do :
How do they interact with the sizeCtl
variable ? It seems this variable is used for multiple operations, none of which are very well documented.
Sorry if this seems like a rant, but its frustrating to not understand the bit manipulations.
These values are only used internally, so they don't need to be well documented.
Still, let's try to explain the different roles and values that sizeCtl
has:
table
is null: initial table size when specified in the constructor, 0 for default table size (DEFAULT_CAPACITY
which is 16) - this value is always greater or equal to 0table
is being initialized because some thread put the first value into the ConcurrentHashMap
- either by calling the constructor with a Map
of values or by calling one of the entry adding methods.table
is not null: number of entries where the next resize will be started, calculated as n - n/4
with n being table.length
- this value is always greater than 0The special value while resizing is built from two parts:
RESIZE_STAMP_BITS
(16) and is placed in sizeCtl
by shifting it left by RESIZE_STAMP_SHIFT
(32 - RESIZE_STAMP_BITS
which is incidentally also 16)32 - RESIZE_STAMP_BITS
(16)You can picture this as
31 16 15 0
+--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--+
| resizeStamp | resizerCount |
+--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--+
MAX_RESIZERS
is just the maximal value the the "resizerCount" part can hold.The resizerCount is used to control the acquisition of additionals threads to help out in resizing the ConcurrentHashMap
. The thread that initiates the resizing operating sets its value to 2 (plus the value resizeStamp << RESIZE_STAMP_SHIFT
). If additional threads try to add entries during a resize operation they check whether there are table entries to be moved and the value of the resizerCount is less than MAX_RESIZERS
. If this is the case they join the resize operation by incrementing the resizerCount and the start moving map entries from the old table
to the nextTable
.
Moving map entries from the old table
to the nextTable
is done in blocks (to prevent contention). After each block the threads participating in the resize operation check whether there are more blocks to move. If there are no more blocks, the thread decrements the resizerCount, checks if it is the last thread doing resizing (indicated by resizerCount now being 1) and if it is the last thread will finish the resize operation: change table
to nextTable
and set sizeCtl
to the amount of entries that will trigger the next resize operation.
Why is the resizeStamp needed?
Because the threads must coordinate the resize work. A thread "X" that decides to participate in resizing reads the values of the table
and nextTable
fields and then tries to join the group of threads doing the resizing.
It can happen that the thread "X" is suspended between reading the fields and joining the group of threads doing the resizing work and that the resizing of the table
that it has read is already completed but a new resize is in progress. The value in resizeStamp encodes the size of the table
array and lets the thread "X" detect that situation which means that it must re-read the values of the table
and nextTable
fields.