erlangtimeoutelixirphoenix-frameworkcowboy

Why does the Pry shell timeout so quickly in Phoenix/cowboy? (shell process exited with reason: shutdown)


When I use IEx.pry or a breakpoint (via :int.break()), the interactive shell dies too quickly, and I only have ~10 seconds before I lose my session:

** (EXIT from #PID<0.606.0>) shell process exited with reason: shutdown

And 10 seconds isn’t enough to productively debug my code in the shell/debugger. My best guess is that there’s a default timeout in a :supervisor.child_spec that I need to override, but I’m not 100% sure.

This is what I’ve tried so far (and why they haven’t worked so far):

How do people use the debugger/IEx.pry? (I'm coming from a Ruby/JS world, so I like to have some time to inspect the variables.) Are people just not encountering the 10s timeout like I am? Or is there some common configuration that I’m missing for my troubleshooting needs?

My Supervisor/child configs in application.ex:

Note the shutdown: :infinity configuration.

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    children = [
      MyApp.Repo,
      {MyApp.Web.Endpoint, [shutdown: :infinity]},
      {Phoenix.PubSub, [name: MyApp.PubSub, adapter: Phoenix.PubSub.PG2]},
      {MyApp.Middleware.Ets.AnEtsThing, [name: MyApp.Middleware.Ets.AnEtsThing, table_name: :my_app_config_2]},
    ]
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

  def config_change(changed, _new, removed) do
    MyApp.Web.Endpoint.config_change(changed, removed)
    :ok
  end
end

My cowboy configs in dev.exs

config :my_app, MyApp.Web.Endpoint,
  http: [
    port: 4000,
    protocol_options: [
      request_timeout: 100_000_000,
      shutdown_timeout: 100_000_000,
      idle_timeout: 100_000_000,
      linger_timeout: 100_000_000,
    ]
  ]

Console inspection of Cowboy configs

This is just to confirm that I was configuring Cowboy in the right place. I confirmed that toggling the port did affect the ports.

iex(4)> MyApp.Web.Endpoint.config(:http)
[
  port: 4001,
  protocol_options: [
    request_timeout: 100000000,
    shutdown_timeout: 100000000,
    idle_timeout: 100000000,
    linger_timeout: 100000000
  ]
]

This is what I see in the IEX console by running iex -S mix phx.server:

Note that there is no stacktrace that tells me what is killing my pry session.

Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
Request to pry #PID<0.718.0> at MyApp.Foo.foo/3 (lib/my_app/foo.ex:16)

   14: 
   15:   def foo(_parent, _args, %{context: %{bar: bar}}) do
   16:     require IEx; IEx.pry
   17: 
   18:     some_code()

Allow? [Yn] <=== I type "Enter"
        
Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
pry(1)> DateTime.utc_now <=== I type this to show when the pry session starts
~U[2020-12-28 06:18:27.861251Z]
** (EXIT from #PID<0.718.0>) shell process exited with reason: shutdown
        
...
        
Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
pry(1)> DateTime.utc_now
~U[2020-12-28 06:18:40.420951Z] <== ~13 seconds, pry session dies

Solution

  • I found the issue: the problem was in a 10s timeout I set in a separate JS project. (haha oops!) I ended up using the Erlang Observer to add a verbose trace on the cowboys_clear and found more inner details about the specific Cowboy error being sent, which led me to realize that the it was initiated from the JS.

    Explanation of how the JS (ie client) timeout affected cowboy/Phoenix debugging

    The CLIENT would repeatedly close the HTTP connection after 10 seconds, causing Cowboy (which Phoenix is built on) to kill its HTTP stream handler, which in turn would kill the Phoenix IEx.pry session.

    This meant that my attempts to increase the Cowboy timeout configurations wouldn't change the kill behavior. My timeouts were being changed but was not being triggered.

    How I troubleshot the Cowboy timeout issue via Erlang's :observer

    Though my JS issue is project-specific, you may need to dig into the lower-levels of Cowboy from your Phoenix application. These

    1. Run the Phoenix application iex -S mix phx.server
    2. Run :observer.start() to launch Observer GUI
    3. Click on the Applications tab, and look for my Phoenix endpoint (like Elixir.MyApp.Web.Endpoint)
    4. Right-click through its children, clicking on each's Dictionary tab to find one that has an '$initial_call' of cowboy_clear (it was nested 3 levels deep from MyApp.Web.Endpoint)
    5. (Here I was able to confirm that my triggered Pry breakpoints were a descendent from this cowboy_clear node with something like: self |> Process.info |> Keyword.get(:dictionary) |> Keyword.get(:"$ancestors").)
    6. Right-click on the cowboy_clear and select "Trace Process Tree" - I selected all the options.
    7. Change tabs to "Trace Overview", select the pid, and click "Start Trace"
    8. Wait for cowboy to die, and save the trace to a text file
    9. Search for 'exit', 'shutdown', or something like that
    10. In my case, I found 20:57:13:358740 (<0.5984.0>) exit {shutdown,{socket_error,closed,'The socket has been closed.'}}
    11. At this point, I guessed it was JS b/c Cowboy didn't seem to be the one that triggered the request closure

    Extra musing: theoretically I would have saved a bunch of time by running my request in GraphiQL or Postman because the timeout didn't exist there.

    Useful resources