the process by which the system sort the map output on map side is known as the sort. is this part of shuffle? In other words, when does shuffle start? After the map output has been wrote to disk, or after the map output has been wrote to the buffer in memory
The whole Map-reduce processed is explained at detailed level here: http://ercoppa.github.io/HadoopInternals/AnatomyMapReduceJob.html
To answer your question, the steps in single map task comprises of:
The Execution and Spilling phase occurs in-parallel. So, data is written in a circular buffer memory -> Sorted in memory -> When buffer is 80% full -> Written to local disk.
At the end of the EXECUTION phase, the SPILLING thread is triggered for the last time. In more detail, we:
Notice that for each time the buffer was almost full, we get one spill file (SpillReciord + output file). Each Spill file contains several partitions (segments).