I have a simple supervisor that looks like this
-module(a_sup).
-behaviour(supervisor).
%% API
-export([start_link/0, init/1]).
start_link() ->
supervisor:start_link({local,?MODULE}, ?MODULE, []).
init(_Args) ->
RestartStrategy = {simple_one_for_one, 5, 3600},
ChildSpec = {
a_gen_server,
{a_gen_server, start_link, []},
permanent,
brutal_kill,
worker,
[a_gen_server]
},
{ok, {RestartStrategy,[ChildSpec]}}.
When I run this on the shell, it works perfectly fine. But now I want to run different instances of this supervisor on different nodes, called foo and bar (started as erl -sname foo
and erl -sname bar
, from a separate node called main erl -sname main
). This is how I try to initiate this rpc:call('foo@My-MacBook-Pro', a_sup, start_link, []).
, but after replying with ok it immediately fails with this message
{ok,<9098.117.0>}
=ERROR REPORT==== 7-Mar-2022::16:05:45.416820 ===
** Generic server a_sup terminating
** Last message in was {'EXIT',<9098.116.0>,
{#Ref<0.3172713737.1597505552.87599>,return,
{ok,<9098.117.0>}}}
** When Server state == {state,
{local,a_sup},
simple_one_for_one,
{[a_gen_server],
#{a_gen_server =>
{child,undefined,a_gen_server,
{a_gen_server,start_link,[]},
permanent,false,brutal_kill,worker,
[a_gen_server]}}},
{maps,#{}},
5,3600,[],0,never,a_sup,[]}
** Reason for termination ==
** {#Ref<0.3172713737.1597505552.87599>,return,{ok,<9098.117.0>}}
(main@Prachis-MacBook-Pro)2> =CRASH REPORT==== 7-Mar-2022::16:05:45.416861 ===
crasher:
initial call: supervisor:a_sup/1
pid: <9098.117.0>
registered_name: a_sup
exception exit: {#Ref<0.3172713737.1597505552.87599>,return,
{ok,<9098.117.0>}}
in function gen_server:decode_msg/9 (gen_server.erl, line 481)
ancestors: [<9098.116.0>]
message_queue_len: 0
messages: []
links: []
dictionary: []
trap_exit: true
status: running
heap_size: 610
stack_size: 29
reductions: 425
neighbours:
From the message it looks like the call expects the supervisor to be a gen_server instead? And when I try to initiat a gen_server on the node like this, it works out just fine, but not with supervisors. I can't seem to figure out if there's something different in trying to initiate supervisor on local/remote nodes, and if yes, what should we do to fix the issue?
As per @JoséM's suggestion, the supervisor in the remote node is also linked to the ephemeral RPC process. However since supervisor does not provide a start
method, modifying the start_link()
method as
start_link() ->
Pid = supervisor:start_link({local,?MODULE}, ?MODULE, []).
unlink(Pid),
{ok, Pid}.
solves the issue.