erlang timeout elixir phoenix-framework cowboy

Why does the Pry shell timeout so quickly in Phoenix/cowboy? (shell process exited with reason: shutdown)

When I use IEx.pry or a breakpoint (via :int.break()), the interactive shell dies too quickly, and I only have ~10 seconds before I lose my session:

** (EXIT from #PID<0.606.0>) shell process exited with reason: shutdown

And 10 seconds isn’t enough to productively debug my code in the shell/debugger. My best guess is that there’s a default timeout in a :supervisor.child_spec that I need to override, but I’m not 100% sure.

This is what I’ve tried so far (and why they haven’t worked so far):

Increase the :shutdown parameter to :infinity in the child_specs to Supervisor.start_link() in MyApp.Application.
Updated the cowboy timeouts (idle_timeout, linger_timeout, request_timeout, shutdown_timeout) because Observer indicated that my request is a cowboy process. I don’t think it’s likely that it’s cowboy, though. If the default is 60s, I’m not even getting close to 60s.
Instead of using IEx.pry, I tried using the debugger (via :ni.break)
Just noting that I can’t use mix test --trace, which sets the timeout to :infinity. Since I’m trying to debug a non-test Endpoint action, this flag won't help.

How do people use the debugger/IEx.pry? (I'm coming from a Ruby/JS world, so I like to have some time to inspect the variables.) Are people just not encountering the 10s timeout like I am? Or is there some common configuration that I’m missing for my troubleshooting needs?

My Supervisor/child configs in application.ex:

Note the shutdown: :infinity configuration.

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    children = [
      MyApp.Repo,
      {MyApp.Web.Endpoint, [shutdown: :infinity]},
      {Phoenix.PubSub, [name: MyApp.PubSub, adapter: Phoenix.PubSub.PG2]},
      {MyApp.Middleware.Ets.AnEtsThing, [name: MyApp.Middleware.Ets.AnEtsThing, table_name: :my_app_config_2]},
    ]
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

  def config_change(changed, _new, removed) do
    MyApp.Web.Endpoint.config_change(changed, removed)
    :ok
  end
end

My cowboy configs in dev.exs

config :my_app, MyApp.Web.Endpoint,
  http: [
    port: 4000,
    protocol_options: [
      request_timeout: 100_000_000,
      shutdown_timeout: 100_000_000,
      idle_timeout: 100_000_000,
      linger_timeout: 100_000_000,
    ]
  ]

Console inspection of Cowboy configs

This is just to confirm that I was configuring Cowboy in the right place. I confirmed that toggling the port did affect the ports.

iex(4)> MyApp.Web.Endpoint.config(:http)
[
  port: 4001,
  protocol_options: [
    request_timeout: 100000000,
    shutdown_timeout: 100000000,
    idle_timeout: 100000000,
    linger_timeout: 100000000
  ]
]

This is what I see in the IEX console by running iex -S mix phx.server:

Note that there is no stacktrace that tells me what is killing my pry session.

Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
Request to pry #PID<0.718.0> at MyApp.Foo.foo/3 (lib/my_app/foo.ex:16)

   14: 
   15:   def foo(_parent, _args, %{context: %{bar: bar}}) do
   16:     require IEx; IEx.pry
   17: 
   18:     some_code()

Allow? [Yn] <=== I type "Enter"
        
Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
pry(1)> DateTime.utc_now <=== I type this to show when the pry session starts
~U[2020-12-28 06:18:27.861251Z]
** (EXIT from #PID<0.718.0>) shell process exited with reason: shutdown
        
...
        
Interactive Elixir (1.10.3) - press Ctrl+C to exit (type h() ENTER for help)
pry(1)> DateTime.utc_now
~U[2020-12-28 06:18:40.420951Z] <== ~13 seconds, pry session dies

Solution

I found the issue: the problem was in a 10s timeout I set in a separate JS project. (haha oops!) I ended up using the Erlang Observer to add a verbose trace on the cowboys_clear and found more inner details about the specific Cowboy error being sent, which led me to realize that the it was initiated from the JS.

Explanation of how the JS (ie client) timeout affected cowboy/Phoenix debugging

The CLIENT would repeatedly close the HTTP connection after 10 seconds, causing Cowboy (which Phoenix is built on) to kill its HTTP stream handler, which in turn would kill the Phoenix IEx.pry session.

This meant that my attempts to increase the Cowboy timeout configurations wouldn't change the kill behavior. My timeouts were being changed but was not being triggered.

How I troubleshot the Cowboy timeout issue via Erlang's `:observer`

Though my JS issue is project-specific, you may need to dig into the lower-levels of Cowboy from your Phoenix application. These

Run the Phoenix application iex -S mix phx.server
Run :observer.start() to launch Observer GUI
Click on the Applications tab, and look for my Phoenix endpoint (like Elixir.MyApp.Web.Endpoint)
Right-click through its children, clicking on each's Dictionary tab to find one that has an '$initial_call' of cowboy_clear (it was nested 3 levels deep from MyApp.Web.Endpoint)
(Here I was able to confirm that my triggered Pry breakpoints were a descendent from this cowboy_clear node with something like: self |> Process.info |> Keyword.get(:dictionary) |> Keyword.get(:"$ancestors").)
Right-click on the cowboy_clear and select "Trace Process Tree" - I selected all the options.
Change tabs to "Trace Overview", select the pid, and click "Start Trace"
Wait for cowboy to die, and save the trace to a text file
Search for 'exit', 'shutdown', or something like that
In my case, I found 20:57:13:358740 (<0.5984.0>) exit {shutdown,{socket_error,closed,'The socket has been closed.'}}
At this point, I guessed it was JS b/c Cowboy didn't seem to be the one that triggered the request closure

Extra musing: theoretically I would have saved a bunch of time by running my request in GraphiQL or Postman because the timeout didn't exist there.

Useful resources

Using Erlang Observer in Elixir - I ran mix phx.server in debugging mode, and ran the observer command from the IEx prompt
Cowboy architecture and execution flow - This was helpful in understanding the process structure, so I could guess that I needed to trace cowboy_clear
Plug.Cowboy - Explanation of elixir configuration
cowboy_http configuration options - These were the cowboy timeout configs I was toggling before realizing it was JS

Why does the Pry shell timeout so quickly in Phoenix/cowboy? (shell process exited with reason: shutdown)

Explanation of how the JS (ie client) timeout affected cowboy/Phoenix debugging

How I troubleshot the Cowboy timeout issue via Erlang's :observer

Useful resources

How I troubleshot the Cowboy timeout issue via Erlang's `:observer`