I try to read URLs from a Redis store and simply fetch the HTTP status of the URLs. All within EventMachine. I don't know what's wrong with my code, but it's not asynchronous like expected.
All requests are fired from the first one to the last one and curiously I only get the first response (the HTTP header I want to check) after the last request. Does anyone have a hint what's going wrong there?
require 'eventmachine'
require 'em-hiredis'
require 'em-http'
EM.run do
@redis = EM::Hiredis.connect
@redis.errback do |code|
puts "Error code: #{code}"
end
@redis.keys("domain:*") do |domains|
domains.each do |domain|
if domain
http = EM::HttpRequest.new("http://www.#{domain}", :connect_timeout => 1).get
http.callback do
puts http.response_header.http_status
end
else
EM.stop
end
end
end
end
I'm running this script for a few thousand domains so I would expect to get the first responses before sending the last request.
While EventMachine is async, the reactor itself is single threaded. So, while your loop is running and firing off those thousands of requests, none of them are being executed until the loop exits. Then, if you call EM.stop, you'll stop the reactor before they execute.
You can use something like EM::iterator to break up the processing of domains into chunks that let the reactor execute. Then you'll need to do some magic if you really want to EM.stop by keeping a counter of the dispatched requests and the received responses before you stop the reactor.