I want to implement the following job/resource scheduling problem:
- A DAG of jobs where edges encode precedence relations.
- Jobs may be executed in parallel if they do not have a precedence relation
- Multiple resource pools, each pool containing one or more similar resources.
- A job may depend on one or more resources from one or more pools. I.e. job J1 says something like: "I need 2 resources from pool P1 and 7 resources from pool P2."
- A job may express that it wants exactly the same resources as one of their immediate predecessors. I.e. job J2 might say "I need 1 resource from pool P1, but it must be one of the resources that job J1 got assigned." As a simplification I assume that job J2 must be a direct successor of J1 for this kind of constraint.
- A resource dependency can be read or write or both or "don't care"
- When job J1 says that it writes to resources from pool P1 and its successor job J2 has a read dependency on "the same resource as J1 got from P1". In between, the resource is not available for other jobs for writing because resources are stateful.
- I don't know the execution time of each job in advance and there are also no priorities nor deadline requirements on the jobs.
I am looking for:
- a way to express this problem in a formal domain,
- an offline schedulability test that answers the question whether a job graph can be executed with the given requirements and constraints.
- a suggestion for an online scheduling algorithm
If there were no resource pools, but only 1 resource of each type, the problem would probably be much simpler. I am familiar with the basics in graph theory and simple dataflow analysis algorithms.
I think I'd modify your description by introducing the concept of "named resources" where a named resource is a name and a collection of unnamed resources. Then jobs can depend on named and unnamed resources, and every named resource must stay resident from the time that the first job using it starts to the time that the last job using it ends.
Formally, we have
- n jobs J = {0, 1, …, n-1}
- an irreflexive, transitive precedence relation ≺ on J
- a set of resource types R
- a set of names X
- a map A from R to ℕ indicating the available resources (where ℕ = {0, 1, 2, …} is the natural numbers)
- a map U from J to ℕR indicating unnamed resource requirements (ℕR is maps from R to ℕ)
- a map Y from J to 2X indicating named resource requirements (2X is subsets of X)
- a map Z from X to ℕR indicating the composition of named resources.
For checking schedule feasibility, there's no reason to run jobs in parallel. Therefore, what we want is a linear extension of < of ≺ that fits. In formal terms closer to what a solver could handle, we would define < from a bijection π from J to {0, 1, …, n-1} that satisfies
- for all jobs j1 ≺ j2, we have π(j1) < π(j2) [< is a linear extension]
- for each time t in {0, 1, …, n-1}, the resources in use do not exceed what's available. Formally (urgh), for each named resource x ∈ X, let sx and ex be the minimum and maximum values of {π(j) | j ∈ J ∧ x ∈ Y(j)} [i.e., the time of each job that depends on x]. Then for all t, we want
∑x ∈ X [sx ≤ t ≤ ex] Z(x) + U(π-1(t)) ≤ A,
where [condition] is the Iverson bracket (1 if the condition is met, else 0) and ≤ is the standard partial order on vectors.
To test schedule feasibility, I'd feed something like this formulation to a CP-SAT solver (e.g., https://developers.google.com/optimization/cp/cp_solver).
To schedule online, I'd use a variant of Dijkstra's Banker's algorithm that uses the offline test to see whether it's safe to start a job whose dependencies are finished. This will get the parallelism back since it may be OK to start multiple jobs.