amazon-s3cluster-computingriakriak-cs

Does Riak CS instances interact with other Riak CS instances inside a cluster and why is Riak CS deployed on each node of a cluster?


Riak CS implements the S3 API for an underlying Riak distributed, decentralized data storage system. Not only the Riak data storage system but also Riak CS must be deployed on each node. A further component called Stanchion must be deployed on a single node inside the cluster to keep user IDs and bucket names unique inside the cluster.

Unlike Riak and Riak CS, which both run on multiple nodes in your cluster, there should be only one running Stanchion instance in your Riak CS cluster at any time.


Solution

  • There is actually no need to install Riak CS on every Riak KV node. At last count, one of our larger Riak CS Support customers only had 20 Riak CS nodes for over 300 Riak KV nodes.

    Oversimplifying considerably, Riak CS is essentially a standalone client for Riak KV that provides an S3 interface and Stanchion is the coordinator that makes sure everything gets put in the right place.

    Riak CS nodes do not communicate with each other. The idea of multiple CS nodes is to provide redundancy in case of Riak CS node failure and to share load in the event of multiple clients connecting simultaneously. Ideally, multiple Riak CS nodes should have a load balancer in front of them. Haproxy is the most popular one.

    The latest release at time of writing is Riak CS 2.1.2 and Stanchion 2.1.2 and they can be downloaded from (https://files.tiot.jp/riak/cs/2.1/2.1.2/ and https://files.tiot.jp/riak/stanchion/2.1/2.1.2/ accordingly) but we eagerly anticipate the release of Riak CS 3.0 on OTP 20 and OTP 22 later this year