ruby-on-railsunicorneventmachinepubnubkeen-io

Large number of threads under unicorn


I am in the process of debugging some Posgtres connection leaks in our application. Few days back we suddenly crossed 100 connections when we should not be - coz we just have 8 unicorn workers and a sidekiq process (25 threads).

I was looking at htop today and saw that a ton of threads were being spawned from my unicorn workers. Eg:

enter image description here Am I reading this correctly? This should not be happening right? If these are threads being spawned, any idea how to debug this?

Thanks! Btw, my other problem - (Postgres connections) Debugging unicorn postgres connection leak

EDIT

I just followed some tips here - http://varaneckas.com/blog/ruby-tracing-threads-unicorn/ - and when I printed the stack trace from the workers' threads, here's what I got when there are many threads..

[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] -------------------

This is my unicorn.rb https://gist.github.com/steverob/b83e41bb49d78f9aa32f79136df5af5f and it spawns a thread for EventMachine in after_fork.

The reason for EventMachine is this --> https://github.com/keenlabs/keen-gem#asynchronous-publishing

Is this normal? Shouldn't the threads be killed? Could this be also causing unncessary db connections to be open? Thanks

UPDATE: I just found out that I am using an older version of the PubNub gem which uses EM and I ran into these lines in the pubnub.log file -

D, [2016-04-06T21:31:12.130123 #1573] DEBUG -- pubnub: Created event Pubnub::Publish
D, [2016-04-06T21:31:12.130144 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire
D, [2016-04-06T21:31:12.130162 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire | Adding event to async_events
D, [2016-04-06T21:31:12.130178 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire | Starting railgun
D, [2016-04-06T21:31:12.130194 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | starting EM in new thread
D, [2016-04-06T21:31:12.130243 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | We aren't running on thin
D, [2016-04-06T21:31:12.130264 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | EM already running

Solution

  • So, after all, the behavior seems to be normal in your particular case.

    The unicorn threads stack traces you provided (obtained using this method) point to the spawn_threadpool method in EventMachine. This code in EventMachine is called when some other code calls EventMachine.defer, a method which spawns a pool of 20 threads by default on its first invocation. I found usage of EventMachine.defer in an older version of the pubnub gem (e.g. here) but it may as well be used from other places.

    So, I think this explains the high volume of threads you observe on each worker. They mostly wait in the pop method which suspends the thread until something is pushed in the queue (deferred again in EventMachine). So unless you have a high load of deferred operations, the threads are mostly doing nothing.

    If you don't need to have 20 threads on each unicorn worker ready for deferrable operations (most probably you don't), you can try to lower the number of the threads in the pool by setting the threadpoolsize variable to some reasonable number, e.g.:

    EventMachine.threadpool_size = 5
    

    I would put this somewhere in the after_fork block in unicorn config.

    Also, as another option, you might consider using the unicorn-worker-killer gem to periodically kill unicorn's workers.

    By the way, the messages that pubnub spits into its log seem to be OK, as it just tells us that it found an already initialized EventMachine threads so it does not have to start a new one. This source code clarifies it.