Let's say I run a job in Spark with speculation = true
.
If a task (let's say T1) takes a long time, Spark would launch a copy of task T1, say, T2 on another executor, without killing off T1.
Now, if T2 also takes more time than the median of all successfully completed tasks, would Spark launch another task T3 on another executor?
If yes, is there any limit to this spawning of new tasks? If no, does Spark limit itself to one parallel job, and waits indefinitely for completion of either one?
The spark TaskSetManager
is responsible for that logic. It is checking that at most one copy of the original task is running when trying to launch a speculatable task. So in your example it should never launch T3 since there would be 2 copies running.
You can find the relevant part of the code here.