There are tasks that read from a file, do some processing and write to a file. These tasks are to be scheduled based on the dependency. Also tasks can be run in parallel, so the algorithm needs to be optimized to run dependent tasks in serial and as much as possible in parallel.
eg:
So one way to run this would be run 1, 2 & 4 in parallel. Followed by 3.
Another way could be run 1 and then run 2, 3 & 4 in parallel.
Another could be run 1 and 3 in serial, 2 and 4 in parallel.
Any ideas?
Let each task (e.g. A,B,...
) be nodes in a directed acyclic graph and define the arcs between the nodes based on your 1,2,...
.
You can then topologically order your graph (or use a search based method like BFS). In your example, C<-A->B->D
and E->F
so, A
& E
have depth of 0 and need to be run first. Then you can run F
,B
and C
in parallel followed by D
.
Also, take a look at PERT.
How do you know whether B
has a higher priority than F
?
This is the intuition behind the topological sort used to find the ordering.
It first finds the root (no incoming edges) nodes (since one must exist in a DAG). In your case, that's A
& E
. This settles the first round of jobs which needs to be completed. Next, the children of the root nodes (B
,C
and F
) need to be finished. This is easily obtained by querying your graph. The process is then repeated till there are no nodes (jobs) to be found (finished).