javaconcurrencylock-freeaba

Why does automatic garbage collection eliminate ABA problems?


I have investigated ABA problem in Concurrency in practice book, in Wikipedia and I have read following post

As I understand the root cause of ABA problem that in algoritm we check that state same as was before but algorithm implies that state was untouched.

Example with a stack data structure:

to add element to stack we use following algorithm:

create new stack node(save to `newNode` variable)

while(true) {
    oldHead = stack.get();
    newNode.next = oldHead; // point_1
    if(stack.compareAndSet(oldhead, newNode)) { // atomically replace head if now head same as was in start of iteration
        break;
    }    
}

Steps which lead to the ABA problem:
Initial state

a->b->c  // a-head, c- tail.
  1. Thread_1 tries to add value d to the stack and OS suspend the thread before compareAndSet operation(point_1)

  2. Thread_2 then execute pop (Thread_1 still suspended)

    b->c  // b-head, c- tail.
    
  3. Thread_3 then execute pop (Thread_1 still suspended)

    c // c-head, c- tail.
    
  4. Thread_4 then executes push a(Thread_1 still suspended)

    a->c // a-head, c- tail.
    
  5. Thread_1 wakes up and and cas operation executes successfully although it can be undesirable in some cases.

Although this answer is accepted I don't understand why automatic garbage collection eliminates the problem.

Although I am not expert in C I understand that in C you cannot allocate one memory range for two different object.

Can you clarify it more clearly?


Solution

  • Part of your confusion probably comes from the point that you are mixing up data structure itself with contents.

    In 'proper' implementation, you will have stack nodes containing data, not being data itself. So, what you end up with is Node(a), Node(b) etc - so when some thread is pushing c to stack, it will pass c, but it will be Node(c) which is actually pushed.

    This means, that in such environment, thing entered into stack at step 4 won't be just a, it will be new Node(a), which will be different pointer from Node(a) which other thread has seen at step 1. These objects might be very well equal business-wise (so in java for example, they equals method will return true), but pointer-wise comparison will differ. And here we come to part where automatic GC makes a difference. Blocked thread from step 1 still holds reference to old instance of Node(a) on stack/registers, even if it was removed later from stack and there are no strong references to it anywhere in the heap. This prevents that node from being deleted and memory address from getting reused, which would fool CAS.

    Please note that bad situation you are referring to here could only happen in non-GC language, if you delete (memory-wise) original Node(a) while thread 1 is still in critical section. This is quite tricky - you leave thread with pointer to freed block of memory and need to be really, really sure it won't be dereferenced at any later point, as it would end up with a lot worse result that ABA (you could read any garbage from freed block).

    If you don't implement layer of indirection in form of Node(x) and just let client calls to push/pop/modify internal nodes directly, then all bets are off - client can just push same node twice for example, leading to infinite loop later. In your example, it would first delete and then later reinsert same node, but that assumes a lot of leaking state between data structure and client code - very dangerous thing to do in multithreaded environment, especially if you want to experiment with lock-free structures.

    To summarize, automatic GC does not protect from all cases of ABA. It protects from very specific case of ABA, related to memory reuse by malloc, for aggresively optimized lock-free code which holds references to dead objects.