I am reading semi-explicit parallelism in Haskell, and get some confusion.
par :: a -> b -> b
People say that this approach allows us to make automatically parallelization by evaluating every sub-expression of a Haskell program in parallel. But this approach has the following disadvantages:
1) It creates far too many small items of the work, which cannot be efficiently scheduled. As I understand, if you use par function for every lines of Haskell program, it will creates too many threads, and it's not practical at all. Is that right?
2) With this approach, parallelism is limited by data dependencies in the source program. If I understand correctly, it means every sub-expression must be independent. Like, in the par function, a and b must be independent.
3) The Haskell runtime system does not necessarily create a thread to compute the value of the expression a. Instead, it creates a spark, which has the potential to be executed on a different thread from the parent thread.
So, my question is : finally the runtime system will create a thread to compute a or not? Or if the expression a is needed to compute the expression b, the system will create a new thread to compute a? Otherwise, it will not. Is this true?
I am a newbie to Haskell, so maybe my questions are still basic for all of you. Thanks for your answer.
The par
combinator you mention is part of the Glasgow parallel Haskell (GpH) which implements semi-explicit parallelism, which however means it is not fully implicit and hence does not provide automatic parallelisation. The programmer still needs to identify subexperssions deemed worthwhile executing in parallel to avoid the issue you mention in 1).
Moreover, the annotation is not prescriptive (as e.g. pthread_create in C or forkIO in Haskell) but advisory, that means the runtime system finally decides whether to evaluate the subexpressions in parallel or not. This provides additional flexibility and a way for dynamic granularity control. Additionally, so-called Evaluation Strategies have been designed to abstract over par
and pseq
and to separate specification of coordination from computation. For instance, a strategy parListChunk
allows to chunk a list and force it to weak head normal form (this is a case where some strictness is needed).
2) Parallelism is limited by data dependencies in the sense that the computation defines the way the graph is reduced and which computation is demanded at which point. It is not true that every sub-expression must be independent. For instance E1 par E2 returns the result of E2, it means to be useful, some part of E1 needs to be used in E2 and hence E2 depends on E1.
3) The picture is slightly confused here because of the GHC-specific terminology. There are Capabilities (or Haskell Execution Contexts) which implement parallel graph reduction and maintain a spark pool and a thread pool each. Usually there is one Capability per core (can be thought of as OS threads). On the other end of the continuum there are sparks, which are basically pointers to parts of the graph that have not been evaluated yet (thunks). And there are threads (actually sort of tasks or work units), so that to be evaluated in parallel a spark needs to be turned into a thread (which has a so called thread state object that contains the necessary execution environment and allows a thunk to be evaluated in parallel). A thunk may depend on results of other thunks and blocks until these results arrive. These threads are much more lightweight than OS threads and are being multiplexed onto the available Capablities.
So, in summary, a runtime will not even neccesarily create a lightweight thread to evaluate a sub-expression. By the way, random work-stealing is used for load-balancing.
This is a very high-level approach to parallelism and avoids race conditions and deadlocks by design. The synchronisation is implicitly mediated through graph reduction. A nice position statement discusses further Why Parallel Functional Programming Matters. For more information on the abstract machine behind the scenes have a look at Stackless Tagless G-Machine and checkout the notes on Haskell Execution Model on GHC wiki (usually the most up-to-date documentation alongside the source code).