database-replicationrethinkdbdatabase-cluster

Connect to and work with a RethinkDB cluster


I can't seem to find a lot of documentation on how the clusters in RethinkDB actually work.

  1. In Cassandra I connect to a cluster by defining one or more hosts, so in case one of them is down, or even has been removed, I still can connect to the whole cluster, before the code/configuration will be updated, reflecting the changes of my hosts IP addresses.

As far as I've understood it, RethinkDB doesn't have such a logic and I'd need to implement it myself, but I'd still be at all times connected to the whole cluster, is that correct?

  1. When creating a database, it is "kind of" created for the whole cluster, there is no way and no need to specify the exact servers which would be taking care of it. When creating a table and I don't specify a primary replica tag, which server will be the primary replica? If I specify a tag which is assigned to multiple servers - same question applies. How is the final server which will be the main replica selected?

Solution

  • In Cassandra I connect to a cluster by defining one or more hosts, so in case one of them is down, or even has been removed, I still can connect to the whole cluster, before the code/configuration will be updated, reflecting the changes of my hosts IP addresses.

    In RethinkDB, you connect to the cluster by connecting to a node in the cluster. That node will take care of communicating with all the other nodes in the cluster. If that node disconnects from the cluster, then you might not be able to do writes or read, depending on your cluster sharding and replication. If that node fails, you won't be able to do anything. At that point, you can try connecting to another node.

    As far as I've understood it, RethinkDB doesn't have such a logic and I'd need to implement it myself

    Yes, RethinkDB won't automatically reconnect you to another node in the cluster if your node fails. That being said, this might be as simple as having multiple connections and switching between them (unless I'm missing something!).

    When creating a database, it is "kind of" created for the whole cluster, there is no way and no need to specify the exact servers which would be taking care of it.

    Yes, when you create a database it's created for the whole cluster. A database doesn't really 'live' in a specific node. It's only tables that live in a specific node.

    When creating a table and I don't specify a primary replica tag, which server will be the primary replica?

    RethinkDB will automatically take care of that. It will pick the server where the primary replica will be, based on the following:

    1. Sever distribution load (which servers have more tables and data).
    2. Wether a specific server was already a primary/secondary for that table.

    If you want to manually control in which server the primary or secondary ends up, you can set it manually through the table_config table in the rethinkdb database. (You take a peak at that database. It give you a better view into how RethinkDB works!)

    If I specify a tag which is assigned to multiple servers - same question applies.

    Same as above.

    How is the final server which will be the main replica selected?

    Same as above.


    In terms of documentation, I would suggest the following:

    Sharding and replication: http://rethinkdb.com/docs/sharding-and-replication/ (Although your questions suggest you probably already read this :))