rubyscylladatastax-ruby-driver

The right ruby driver for ScyllaDB


We decided to work with ScyllaDB for some heavy-insert components in our Ruby on Rails project. However, the ruby driver is said to be in maintenance mode and also we heard of its performance issues a couple of years ago.

My question is if someone really works on production with ScyllaDB with ruby? What driver do you use? How does it perform? Are there any pitfalls that we should be aware of? Btw, I know about DynamoDB alternator, but we really prefer to use CQL as opposed to weird DynamoDB json query syntax and need additional Scylla features, like "group by", multi-column partition keys and more.

Thanks!


Solution

  • We currently use a combination of Cequel and the Ruby driver you linked (which Cequel uses under the hood). In our first ScyllaDB/Cassandra project, we assumed that the flexible schemas were more flexible than they actually are (e.g., you can't change keys around without considerations), so Cequel sounded like a good fit. In our second project, which we chose keys and the like very deliberately, we just use the underlying driver semi-directly (we use Cequel::Metal). We handle migrations with a Rake task because migrations don't really work the same as with PostgreSQL (up/down don't make sense in the traditional sense - you don't lose the new columns if you migrate down, you just lose them from new records).

    The default answer in the Cassandra community appears to be "run JRuby, use the JDBC driver". Don't do that. JRuby can be great, for the right people, but it's not totally MRI compatible, and it doesn't perform the same. They'll next suggest ODBC. ruby-odbc should be considered a last resort compatibility library. It has a lot of unimplemented ODBC features. It can leak ODBC state and lock up that thread, or crash the process(!) if the driver doesn't guard against poor thread safety. It will perform exceptionally poorly in Rails. Don't go that route either.

    These two suggestions are about all you get, at least from when I looked around. It would appear that within the Cassandra community, a lot of people are still applying 10 years ago Ruby impressions to modern Ruby. By that, I mean they assume that JRuby is faster than MRI due to the JVM, because Twitter dropped Rails and switched to the JVM. This isn't really the case anymore (and hasn't been in some time). There are some situations where JRuby excels, but plenty where MRI beats it. The folks recommending JDBC probably have good intentions, but it feels a lot like "your language sucks, use ours". This attitude seems to cause them to spend time doing e.g., Python or Go drivers, but not Ruby drivers.

    If ScyllaDB were paying me to work on a driver, I would take their C/C++ driver and use FFI to wrap it and expose a decent API. I would probably not write an ActiveRecord driver because I don't use ScyllaDB/Cassandra for our primary data objects, and non-key-based queries (the main reason you'd use ActiveRecord) aren't possible without WITH FILTERING, which you probably do not want to make available to HTTP clients. You can use materialized views and all that, but then the query is slightly different. A library on top of this could map these concepts to ActiveRecord. The hard part with a FFI wrapper will be to thoughtfully design an idiomatic interface; the rest is thankfully pretty easy due to the FFI project's efforts.