elixir phoenix-framework phoenix-channels

Elixir + Phoenix Channels memory consumption

I'm pretty new to the Elixir and Phoenix Framework, so may be my question is a little bit dumb.

I have an app with Elixir + Phoenix Framework as a backend and Angular 2 as a frontend. I am using Phoenix Channels as a channel for front-end/back-end interchange. And I found a strange situation: if I send a large block of data from backend to frontend then particular channel process memory consumption go up to hundreds of MBs. And each connection (each channel process) eats a such amount of memory, even after transmission ends.

Here is a code snippet from backend channel description:

defmodule MyApp.PlaylistsUserChannel do
  use MyApp.Web, :channel

  import Ecto.Query

  alias MyApp.Repo
  alias MyApp.Playlist

  # skipped ... #

  # Content list request handler
  def handle_in("playlists:list", _payload, socket) do 
    opid = socket.assigns.opid + 1
    socket = assign(socket, :opid, opid)

    send(self, :list)
    {:reply, :ok, socket}
  end

  # skipped ... #        

  def handle_info(:list, socket) do

    payload = %{opid: socket.assigns.opid}

    result =
    try do
      user = socket.assigns.current_user
      playlists = user
                  |> Playlist.get_by_user
                  |> order_by([desc: :updated_at])
                  |> Repo.all

      %{data: playlists}
    catch
      _ ->
        %{error: "No playlists"}
    end

    payload = payload |> Map.merge(result)

    push socket, "playlists:list", payload

    {:noreply, socket}
  end

I've created a set with 60000 records just to test frontend ability to deal with such amount of data, but got a side effect - I found that particular channel process memory consumption is 167 Mb. So I open a few new browser windows and each new channel process memory consumption grew to this amount after the "playlists:list" request.

Is it normal behaviour? I would expect high memory consumption during database query and data offload, but it still the same even after request finished.

UPDATE 1. So with a big help of @Dogbert and @michalmuskala I found that after manual garbage collection memory is going to free.

I've tried to dig a little with recon_ex library and found the following examples:

iex(n1@192.168.10.111)19> :recon.proc_count(:memory, 3)
[{#PID<0.4410.6>, 212908688,
  [current_function: {:gen_server, :loop, 6},
   initial_call: {:proc_lib, :init_p, 5}]},
 {#PID<0.4405.6>, 123211576,
  [current_function: {:cowboy_websocket, :handler_loop, 4},
   initial_call: {:cowboy_protocol, :init, 4}]},
 {#PID<0.12.0>, 689512,
  [:code_server, {:current_function, {:code_server, :loop, 1}},
   {:initial_call, {:erlang, :apply, 2}}]}]

#PID<0.4410.6> is Elixir.Phoenix.Channel.Server and #PID<0.4405.6> is cowboy_protocol.

Next I went with:

iex(n1@192.168.10.111)20> :recon.proc_count(:binary_memory, 3)
[{#PID<0.4410.6>, 31539642,
  [current_function: {:gen_server, :loop, 6},
   initial_call: {:proc_lib, :init_p, 5}]},
 {#PID<0.4405.6>, 19178914,
  [current_function: {:cowboy_websocket, :handler_loop, 4},
   initial_call: {:cowboy_protocol, :init, 4}]},
 {#PID<0.75.0>, 24180,
  [Mix.ProjectStack, {:current_function, {:gen_server, :loop, 6}},
   {:initial_call, {:proc_lib, :init_p, 5}}]}]

and:

iex(n1@192.168.10.111)22> :recon.bin_leak(3)                  
[{#PID<0.4410.6>, -368766,
  [current_function: {:gen_server, :loop, 6},
   initial_call: {:proc_lib, :init_p, 5}]},
 {#PID<0.4405.6>, -210112,
  [current_function: {:cowboy_websocket, :handler_loop, 4},
   initial_call: {:cowboy_protocol, :init, 4}]},
 {#PID<0.775.0>, -133,
  [MyApp.Endpoint.CodeReloader,
   {:current_function, {:gen_server, :loop, 6}},
   {:initial_call, {:proc_lib, :init_p, 5}}]}]

And finally the state of the problem processes after recon.bin_leak (actually after garbage collection, of course - if I run :erlang.garbage_collection() with pids of these processes the result is the same):

 {#PID<0.4405.6>, 34608,
  [current_function: {:cowboy_websocket, :handler_loop, 4},
   initial_call: {:cowboy_protocol, :init, 4}]},
...
 {#PID<0.4410.6>, 5936,
  [current_function: {:gen_server, :loop, 6},
   initial_call: {:proc_lib, :init_p, 5}]},

If I do not run garbage collection manually - the memory "never" (at least, I've waited for 16 hours) become free.

Just to remember: I have such memory consumption after sending a message from the backend to the frontend with 70 000 records fetched from Postgres. The model is pretty simple:

  schema "playlists" do
    field :title, :string
    field :description, :string    
    belongs_to :user, MyApp.User
    timestamps()
  end

Records are autogenerated and look like this:

description: null
id: "da9a8cae-57f6-11e6-a1ff-bf911db31539"
inserted_at: Mon Aug 01 2016 19:47:22 GMT+0500 (YEKT)
title: "Playlist at 2016-08-01 14:47:22"
updated_at: Mon Aug 01 2016 19:47:22 GMT+0500 (YEKT)

I would really appreciate any advices here. I believe I'm not going to send such a big amount of data but even smaller data sets could lead to a huge memory consumption in case of many client connections. And since I haven't coded any tricky things probably this situation hides some more general problems (but it's just an assumtion, of course).

Solution

This is the classical example of the binary memory leak. Let me explain what's happening:

You handle a really big amount of data in the process. This grows the process heap, so that the process is able to handle all that data. After you're done with handling that data, most of the memory is freed, but the heap remains big and possibly holds the reference to the big binary that was created as the final step of handling the data. So now we have a large binary referenced by the process and a big heap with few elements in it. At this point the process enters a slow period only handling small amounts of data, or even no data at all. This means the next garbage collection will be very delayed (remember - heap is big), and it may take some really long time until the garbage collection actually runs and reclaims the memory.

Why the memory is growing in two processes? The channel process grows because of querying the database for all that data and decoding it. Once the result is decoded into structs/maps it is sent to the transport process (the cowboy handler). Sending messages between processes means copying, so all that data is copied over. This means the transport process has to grow to accommodate the data it's receiving. In the transport process the data is encoded into json. Both processes have to grow, and later stay there with big heaps and nothing to do.

Now to the solutions. One way would be to explicitly run :erlang.garbage_collect/0 when you know you've just processed a lot of data and won't do that again for some time. Another could be to avoid growing the heap in the first place - you could process the data in a separate process (possibly a Task) and only concern yourself with the final encoded result. After the intermediary process is done with processing the data, it will stop and free all of its memory. At that point you'll be only passing a refc binary between processes without growing the heaps. Finally there's always the usual approach for handling lots of data that's not needed all at once - pagination.