[SOLVED] How does restricted transactional memory / HTM works in detail?

How does restricted transactional memory / HTM works in detail?

I am learning hardware transactional memory (HTM), but the detailed implementation of it is limited. I know that a transaction in HTM buffers its read/write set in the L1cache and detects conflict through cache coherent protocol. And the program case to use HTM I learned is below.

while (1) { // keep trying
    int status = _xbegin(); // set status = -1 and start transaction
    if (status == _XBEGIN_STARTED) { // status == XBEGIN_STARTED == -1
        (*a) ++; // non atomic increment of shared global variable
        (*b) ++;
        _xend(); // end transaction
        break; // break on success
    } else { //
        x_abort(0xff);
    } //
}

So, I am confused about what happens when a conflict occurs between "(*a)++" and "(*b)++". Says, T0 increases the a while T1 read the a. The cache protocol would detect the conflict and abort the T0. But what happens to T0? Would it keep running the rest of the code, i.e, (*b)++ and _xend()? I think it would not keep running instead it will retry. But how could it know where is the start of the loop? How this implement in detail?

Solution

_xbegin() has very special semantics. If it successfully causes a thread to enter into transactional state, it returns _XBEGIN_STARTED. Entering transactional state does two important things:

An architectural checkpoint is taken on the processor. This essentially records the values of the architectural registers and the program counter.
Memory operations become speculative. Stores aren't seen until the transaction completes. This could be implemented in different ways, for example buffering stores in a local cache or writing the value out to main memory while keeping an undo log.

If at some point that transaction aborts (in your example, T0 aborts due to a read/write conflict with T1), first all of the speculative memory operations are discarded, then the architectural checkpoint is restored. The latter ensures the PC begins just after int status = _xbegin(); except this time the value _XABORT_CONFLICT is returned instead.

You can find some information about the implementation of hardware transactional memory in the blog article Arm’s Transactional Memory Extension support in gem5. If you're interested in the details, it's worth spending some time reading the gem5/Ruby source code.