I'm trying to use Celluloid to process some .csv data asynchronously. I've read that using futures enables you to wait for a pool of actors to finish before the main thread terminates. I've looked at some examples that demonstrate this.
However, when I implement it in my example code, it turns out that using futures is not any faster than doing the processing synchronously. Can anyone see what I'm doing wrong?
require 'smarter_csv'
require 'celluloid/current'
require 'benchmark'
class ImportActor
include Celluloid
def process_row(row)
100000.times {|n| n}
end
end
def do_all_the_things_with_futures
pool = ImportActor.pool(size: 10)
SmarterCSV.process("all_the_things.csv").map do |row|
pool.future(:process_row,row)
end.map(&:value)
end
def do_all_the_things_insync
pool = ImportActor.pool(size: 10)
SmarterCSV.process("all_the_things.csv") do |row|
pool.process_row(row)
end
end
puts Benchmark.measure { do_all_the_things_with_futures}
puts Benchmark.measure { do_all_the_things_insync }
2.100000 0.030000 2.130000 ( 2.123381)
2.060000 0.020000 2.080000 ( 2.069357)
[Finished in 4.6s]
Are you using the standard ruby MRI interpreter?
If so, you won't get any speed-up for entirely CPU-bound tasks -- that is, tasks that aren't doing any I/O, but are entirely doing calculations in the CPU. Your 'test' task of 100000.times {|n| n}
is indeed entirely CPU-bound.
The reason you won't get any speed-up through multi-threading for entirely CPU-bound tasks on MRI, is because the MRI interpreter has a "Global Interpreter Lock" (GIL), that prevents more than one of your CPU cores from being used at once by the ruby interpreter. Multi-threaded parallelism, like celluloid gives you, can speed up CPU work only by running different threads on different CPU cores simulataneously, on a multi-core system like most systems are these days.
But in MRI, that's not possible. This is a limitation of the ruby MRI interpreter.
If you install JRuby and run your test under JRuby, you should see speed-up.
If your task involved some I/O (like making a database query, or waiting on a remote HTTP API, or doing significant amounts of file reading or writing), you could also see some speed-up under MRI. The more proportional time your task spends doing I/O, the more speed-up. This is because even though MRI doesn't allow threads to execute simultaneously on more than one CPU core, a thread waiting on I/O can still be switched out and another thread switched in to do work. Whereas if you weren't using threads, the program would just be sitting around waiting on the I/O doing no work.
If you google for "ruby GIL" you can find more discussions of the issue.
If you are really doing CPU-intensive work that could benefit from multi-threaded parallelism in a way that will significantly help your program, consider switching to Jruby.
And if you really do need multi-threaded parallelism, an alternatives to using Celluloid is using Futures or Promises from the concurrent-ruby package. Concurrent-ruby is generally simpler internally and lighter-weight than Celluloid. However, writing multi-threaded code can be tricky regardless of which tool you use, and even if you use Celluloid or ruby-concurrent to give you better higher-level abstractions than working directly with threads, working with multi-threaded concurrency will require becoming familiar with some techniques for such and require some tricky debugging from time to time.