rubyfileasynchronouscassandraeventmachine

Read file in EventMachine asynchronously


I'm playing with Ruby EventMachines for some time now and I think I'm understandings its basics.

However, I am not sure how to read in a large file (120 MB) performantly. My goal is to read a file line by line and write every line into a Cassandra database (same should be with MySQL, PostgreSQL, MongoDB etc. because the Cassandra client supports EM explicitly). The simple snippet blocks the reactor, right?

require 'rubygems'
require 'cassandra'
require 'thrift_client/event_machine'

EM.run do
  Fiber.new do
    rm = Cassandra.new('RankMetrics', "127.0.0.1:9160", :transport => Thrift::EventMachineTransport, :transport_wrapper => nil)
    rm.clear_keyspace!
    begin
      file = File.new("us_100000.txt", "r")
    while (line = file.gets)
      rm.insert(:Domains, "#{line.downcase}", {'domain' => "#{line}"})
    end
      file.close
    rescue => err
      puts "Exception: #{err}"
      err
    end
    EM.stop
  end.resume
end

But what's the right way to get a file read asynchronously?


Solution

  • There is no asynchronous file IO support in EventMachine, the best way to achieve what you're trying to do is to read a couple of lines on each tick and send them off to the database. The most important is to not read too large chunks since that would block the reactor.

    EM.run do
      io = File.open('path/to/file')
      read_chunk = proc do
        lines_sent = 10
        10.times do
          if line = io.gets
            send_to_db(line) do
              # when the DB call is done
              lines_sent -= 1
              EM.next_tick(read_chunk) if lines_sent == 0
            end
          else
            EM.stop
          end
        end
      end
      EM.next_tick(read_chunk)
    end
    

    See What is the best way to read files in an EventMachine-based app?