I'm playing with Ruby EventMachines for some time now and I think I'm understandings its basics.
However, I am not sure how to read in a large file (120 MB) performantly. My goal is to read a file line by line and write every line into a Cassandra database (same should be with MySQL, PostgreSQL, MongoDB etc. because the Cassandra client supports EM explicitly). The simple snippet blocks the reactor, right?
require 'rubygems'
require 'cassandra'
require 'thrift_client/event_machine'
EM.run do
Fiber.new do
rm = Cassandra.new('RankMetrics', "127.0.0.1:9160", :transport => Thrift::EventMachineTransport, :transport_wrapper => nil)
rm.clear_keyspace!
begin
file = File.new("us_100000.txt", "r")
while (line = file.gets)
rm.insert(:Domains, "#{line.downcase}", {'domain' => "#{line}"})
end
file.close
rescue => err
puts "Exception: #{err}"
err
end
EM.stop
end.resume
end
But what's the right way to get a file read asynchronously?
There is no asynchronous file IO support in EventMachine, the best way to achieve what you're trying to do is to read a couple of lines on each tick and send them off to the database. The most important is to not read too large chunks since that would block the reactor.
EM.run do
io = File.open('path/to/file')
read_chunk = proc do
lines_sent = 10
10.times do
if line = io.gets
send_to_db(line) do
# when the DB call is done
lines_sent -= 1
EM.next_tick(read_chunk) if lines_sent == 0
end
else
EM.stop
end
end
end
EM.next_tick(read_chunk)
end
See What is the best way to read files in an EventMachine-based app?