multithreading operating-system locking critical-section mutual-exclusion

Peterson's solution just use one variable

For Pi:

do {
    turn = i;   // prepare enter section
    while(turn==j);
    //critical section 
    turn = j; //exit section. 
} while(true);

For Pj:

do {
    turn = j;   // prepare enter section
    while(turn==i);
    //critical section 
    turn = i; //exit section. 
} while(true);

In this simplified algorithm, if process i want to enter critical section for i, it will set "turn = i"(different from Peterson's solution which will set "turn = j"). this algorithm does not seem to cause deadlock or starvation, so why Peterson's algorithm not simplified like this?

Another Question: as i know, mutual exclusion mechanisms such as semaphore P/V operations require atomicity (P should do test sem.value and sem.value-- concurrently). but why the algorithm above just use one variable turn does not seem to require atomicity (turn = i, test turn == j not atomicity )?

Solution

Before you ask whether the algorithm avoids deadlock and starvation, you first have to verify that it still locks. With your version, even assuming sequential consistency, the operations could be sequenced like this:

Pi                                         Pj
turn = i;
while (turn == j); // exits immediately
                                           turn = j;
                                           while (turn == i); // exits immediately
// critical section                        // critical section

and you have a lock violation.

To your second question: it depends on what you mean by "atomicity". You do need it to be the case that when one thread stores turn = i; then the other thread loading turn will only read i or j and not anything else. On some machines, depending on the type of turn and the values of i and j, you could get tearing and load an entirely different value. So whatever language you are using may require you to declare turn as "atomic" in some fashion to avoid this. In C++ in particular, if turn isn't declared std::atomic, then any concurrent read/write access is a data race, and the behavior of the entire program becomes undefined (that's bad).

Besides the need to avoid tearing and data races, Peterson's algorithm also requires strict memory ordering (sequential consistency), which on many systems / languages is not guaranteed unless specially requested, again perhaps by declaring the variable as atomic in some fashion.

It is true that unlike more typical lock algorithms, Peterson doesn't require an atomic read-modify-write, only atomic sequentially consistent loads and stores. That's precisely what makes it an interesting and clever algorithm. But there's a substantial tradeoff in complexity and performance, especially if you want more than two threads, and most real-life systems do have reasonably efficient atomic RMW instructions, so Peterson is rarely used in practice.