architectureriakriak-cs

Which should I choose: Riak, Riak CS or both at the same time?


First some background, we have an application that need to store both json data and media assets (images, recorded sounds).

When looking at Riak I can see that right now we would be just fine with "normal" Riak since we don't handle very big files.

However looking forwards soon we have the need for handling a lot bigger files (proprietary binary measurement files and video files) and then Riak CS seems to be an attractive alternative instead.

My question is this: In what way can I combine these two versions of Riak?

  1. Could I just go directly for Riak CS and also store JSON data files in there as well?
  2. Is it possible to start with Riak and then move over to Riak CS but keep the data from the Riak storage backends?
  3. Can I run both Riak and Riak CS on the same servers (multi backend), is Riak CS compatible with Riak client api?
  4. Should I just separate the two and deploy on two clusters (min 10 nodes, 5 Riak + 5 Riak CS)?

Solution

  • This is 4 related questions in 1! I am going to steer clear of the opinionated 'what should you do' and just state what is possible.

    1. Yes, you can store small files in Riak CS. However, this is another layer on top of Riak, so the requests will likely take a little bit longer.

    2. Yes, it is possible to use a Riak instance both directly and for Riak CS. I'm sure Riak CS has some reserved bucket names, but as long as you don't overlap those, you should, in theory, be able to store other data in the same Riak instance. Note though that Riak CS uses Riak bucket/key names that do not convert to JSON properly, so listing operations performed at the Riak level via HTTP may have trouble.

    3. I don't think the APIs are compatible, but you could, again in theory, run 2 instances of Riak on the same server as long as they use different node names, different directories, and listen on different ports. That would be a lot of file handles, ram, etc. but could be possible.

    4. Separate clusters will likely be the easier to troubleshoot than multiple instances on one node. I also suspect that if you ever need tech support, this will be the only one of these options that is supported.