Is there a boundary for how many workers a specific supervisor process should supervise? I've kept coming back to this question a couple of times while building an OTP supervision tree regarding the performance & fault-tolerance:
Should a single supervisor process supervise all workers (thousands) of the same, specific domain, or should you put a smaller number of supervision processes (under that main supervisor) which supervise a subset of worker processes?
There is no boundary or limit for the number of children a supervisor can manage. This is limited only by the system limits on max number of processes. It should also be considered that a supervisor does not do any active management by itself. Unless it is asked to do something, all it does is to sit and wait for exit
messages from its children.
The question should a single supervisor manage all workers, or should they be divided among some number of child supervisors is more an architectural one. If all workers are of the same type and one_for_one
or simple_one_for_one
strategy is used, a single supervisor managing workers directly is the best solution. If requirements are more complicated, it may be useful to group workers of the same type in a child supervisor, to e.g. implement some special restart strategy.