I use the gem Curb
(tried too with httparty
) to performing a lot of http request and this working good. But in one of my (rake
) task (where I do 20k+ requests) I have a memory problem (Rails "eats" more than 2GB of RAM until there is no free memory anymore).
Seems that Rails "don't wait" for a response and go ahead in another thread in the loop, the problem is that in this manner there will be created a lot of objects not collected by the garbage collector (I think) and is the reason of the memory leak.
There is a method to say to rails to wait until the response is came? (I tried with sleep
but is not a stable solution).
I have a pseudocode like this:
def the_start
while start_date <= end_date do # ~ 140 loop
a_method_that_do_sub_specifics_call
end
end
def a_method_that_do_sub_specifics_call
some_data.each do |r| # ~ 180 loop
do_a_call
#do something with models (update/create entries,...)
end
end
def do_a_call # called ~ 25k times
# with the gem Curb version
req = Curl::Easy.new do |curl|
curl.ssl_verify_peer = false
curl.url = url
curl.headers['Content-type'] = 'application/json'
end
req.perform
# actual version, with httparty gem
req = HTTParty.get("#{url}",
:headers => {'Content-type' => 'application/json'})
end
Seems that Rails doesn't wait to have the results of req.perform
.
EDIT:
Tried too to instanciate only once the Curl::Easy
object, using Curl::Easy.perform()
and req.close
(that should calls implicitly GC) after the call but without success ever big memory usage. The only solution that (I think) can work is to "blocks" rails until the response is came, but how?
EDIT 2
In another task I call only the a_method_that_do_sub_specifics_call
without problems.
EDIT 3
After some performance mod (placing find_each(:batch_size => ...)
, GC.start
,...) the task works a little better.. now the first ~100 loop (do_a_call
) work good, after that the memory usage jump from 100Mb to 2Gb+ again.
After days of debugging, reading tons of forums and posts I have found the solution:
a modest class variable string that grows until a memory leak occours.
Some useful notes that I have earned in my trip:
Curb vs HTTParty
Between these two gems that perform curl requests, the best in term of performance is Curb
.
http://bibwild.wordpress.com/2012/04/30/ruby-http-performance-shootout-redux/
Pay attention at the class variables
My problem was a debug/info variable string class that continues growing, avoid to use class variable that are never collected by the garbage collector.
In my specific case was:
@status = "#{@status} Warning - response is empty for #{description}\n"
Perform some manual garbage collection
Perform some manual GC.start
at the critical point to ensure to free the memory that are no more necessary. Remember that calling GC.start
doesn't perform an instantaneous call to the garbage collector, it only suggests it.
Calling ActiveRecords array
When calling big ActiveRecords use .find_each
, e.g.:
Model.find_each(:batch_size => 50) do |row|
This perform a query only for 50 (or something smaller than default value) row every time, better than calling a single query with 1k row. (I guess that the default batch_size
is 1000).
Useful links: