ruby-on-rails ruby concurrency celluloid

Dynamically assigning actors in Celluloid

I'm learning how to use Celluloid. I’ve read all the documentation and think I have the idea of how to use it but lack practise. I'm about to test it with a CSV file with almost 12,000 rows.

I’m unsure how many actors I should assign to a job. I'm guessing this number should be dynamic. According to this railscasts episode the default number is set to the number of cores in your machine, but surely you should change this number based on your workload?

I have 12,000 records to get through, if I execute the code below I'm guessing it will initiate all the actors in my pool and queue them up to handle the jobs. But how should I gauge how many actors to dynamically assign to the work?

There are still many holes in my understanding, so feel free to challenge my whole implementation.

class Model < ActiveRecord::Base
  include Celluloid
  def initialize(row)
    self.name = row[0]
    self.alt_id = row[1]
    self.definition = row[2]
    self.save
    self.terminate
  end    
end

CSV.open("./files/my_file.csv", "wb") do |csv|
  Model.supervise(csv)
end

Solution

First, in your case you should create a different class for your actor.

class Model < ActiveRecord::Base
  def self.save_from_csv(row)
    new.tap do |m|
      m.name = row[0]
      m.alt_id = row[1]
      m.definition = row[2]
      m.save
    end
  end    
end

class CSVWorker
  include Celluloid

  def persist_from_csv(row)
    Model.persist_from_csv(row)
  end
end

Then you can create a pool and do the work for each row.

pool = CSVWorker.pool(size: 4)
CSV.foreach("./files/my_file.csv") do |row|
  pool.async.persist_from_csv(row)
end

Notice the async. That's what makes it run in pseudo parallel.

I admit I haven't tested this, but even if it Works™, you should benchmark it to see if there's actually any gain from paralysation. I doubt that it will be much faster in MRI because the only IO involved is DB queries.