Apache Mesos and Rocks Cluster Distribution can both be used to run tasks and manage cluster resources.
What is the difference between them and in what scenarios is it better to choose one instead of the other.
From what I understand the similarities are:
And likewise the differences are:
Why would someone use Apache Mesos over Rocks Cluster Distribution?
I'm not a Rocks user or expert and work at Mesosphere. These comments are based on research, but not deep experience with Rocks. So take this with at a heaping pile of salt... If someone knows better, I'm happy to take updates.
Rocks Cluster Distribution seems like a traditional distributed operating system designed for use on supercomputers, with the exception that it runs on an existing operating system rather than using its own microkernel. This has several evolutional advantages over older distributed operating system, like Plan 9, but isn't designed to take advantage of more modern advancements in scheduling and hyperscale computing.
Rocks is definitely more mature than Mesos. This is both a pro and a con.
I think the best way to look at it is that Rocks solves problems academics and governments had around ~2000, before VMware brought virtualization to the masses, before Chef, Puppet, and Ansible made cluster provisioning common place, before Google and AWS had planet-sized hyperscale computers spanning datacenters on every continent, before Hadoop popularized map/reduce and distributed computing, before agile invaded enterprise companies, before the iPhone put a supercomputer in everyone's pocket, before microservices made monoliths passé, before Docker popularized containerization, and before IoT put microchips in your shoes and thermostats. All these advancements in the last 15 years mean that people's problems have shifted significantly.
Mesos is only 5 years old. So it's more mature than Docker and Kubernetes, still supports non-containerized native processes (they get wrapped in a configurably isolated container transparently), yet has been used in production for years at massive scale by dozens of companies like Twitter and Apple.
Being new isn't always better, but the landscape moves really fast and it gets harder and harder to incorporate new ideas into old designs.
Modern cluster task schedulers (Hadoop YARN, Mesos, Kubernetes, etc). Allow for scheduling, monitoring, restarting, and re-scheduling tasks at runtime. Rocks however requires re-installing from RPM on every node. Often a GRID computing system must be layered on top, in order to actually use the resources efficiently.
Mesos, on the other hand, makes it easier to write customer schedulers for handling runtime task and application lifecycle management. Several very generic Mesos schedulers also already exist to handle common application lifecycles (Marathon, Aurora, etc). Other distributed applications like Cassandra, Kafka, and Spark have their own custom schedulers to handle business-logic-specific lifecycle management, especially related to persistent data, drain cleanup, and auto-scaling.
Rocks was designed to support the premise of a single system image and has done so by making the cluster invisible to applications running on the cluster. This sounds like an amazing feat, but in practice it's hugely inefficient, causes unpredictable performance, and doesn't provide enough API to handle all the complexities of cluster operations.
In the mean time, Google, Amazon, and others are investing in hyperscale computing which allows for tolerating massive growth at moderate costs, without having to re-architect their infrastructure, platforms, or software.
Mesos provides a new layer of abstraction, rather than trying to emulate the lower levels of abstraction (like POSIX and single-machine OSs). So it is better equipped to handle cluster and node lifecycle events.
Rocks applications use POSIX sockets to communicate. While this enables a lot of low level flexibility, sockets were not designed to tolerate failure the same way network protocols are. Unlike legacy monoliths, modern microservices use network communication as their primary form of communication. This new architecture paradigm of extreme decoupling makes it so that applications don't need to be run together but instead use service discovery to find each other over the network. So modern clusters don't need to accommodate multi-node socket traffic, which frees them up to be significantly more reliable and fault tolerant.
Mesos uses Zookeeper. Rocks uses MySQL.
Mesos allows but does not require workloads to use container images. You can easily just tarball up your process and Mesos will download it to the nodes that need it. Mesos optionally supports Docker containers, but the default is the Mesos container runtime, which has configurable and pluggable levels of isolation.
Mesos isn't an operating system. It's really more like a distributed kernel with master and agent configurations. If you really want to compare against another distributed operating system, take a look at DC/OS, which fills out a lot of the functionality around Mesos to make it into an operating system for your datacenter.