cassandrawide-column-store

Comparing Cassandra structure with Relational Databases


A few days ago I read about wide-column stored type of NoSQL and exclusively Apache-Cassandra.

What I understand is that Cassandra consist of:

A keyspace(like database in relational databases) and supporting many column families or tables (Same as table in relational databases) and unlimited rows.

From Stackoverflow tags:

A wide column store is a type of key-value database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.

In Cassandra all of the rows (in a table) should have a row key then each row key can have multiple columns. I read about differences in implementation and storing data of Relational database and NoSQL (Cassandra).

But I don't understand the difference between structure:

Imagine a scenario which I have a table (or column family in Cassandra):

When I execute a query (CQL) like this :

select * from users;

It gives me the result as you can see :

lastname  | age  | city          | email               
----------+------+---------------+----------------------
      Doe |   36 | Beverly Hills | janedoe@email.com       
    Jones |   35 |        Austin | bob@example.com        
    Byrne |   24 |     San Diego | robbyrne@email.com         
    Smith |   46 |    Sacramento | null                    
   Jones2 | null |        Austin | bob@example.com       

So I perform the above scenario in relational database (MS SQL) with the following query:

select * from [users] 

And the result is:

lastname  | age  | city          | email               
----------+------+---------------+----------------------
      Doe |   36 | Beverly Hills | janedoe@email.com       
    Jones |   35 |        Austin | bob@example.com        
    Byrne |   24 |     San Diego | robbyrne@email.com         
    Smith |   46 |    Sacramento | NULL                    
   Jones2 | NULL |        Austin | bob@example.com       

I know that Cassandra supports dynamic column and I can perform this by using sth like:

ALTER TABLE users ADD website varchar;

But it is available in relational model for example in mssql the above code can be implemented too. Something like:

ALTER TABLE users ADD website varchar(MAX);

What I see is that the first select and second select result is the same. In Cassandra , they just give a row key (lastname) as a standalone object but it is same as a unique field (like ID or a text) in mssql (and all relational databases) and I see the type of column in Cassandra is static (in my example varchar) unlike what it describes in Stackoverflow tag.

So my questions is:

  1. Is there any misunderstanding in my imagination about Cassandra?!

  2. So what is different between two structure ?! I show you the result is same.

  3. Is there any special scenarios (JSON like) that cannot be implemented in relational databases but Cassandra supports? (For example I know that nested column doesn't support in Cassandra.)

Thank you for reading.


Solution

  • We have to look at more complex example to see the differences :)

    For start:

    Table is defined as "two-dimensional view of a multi-dimensional column family".

    The term "wide-rows" was related mainly to the Thrift API. In cql it is defined a bit differently, but underneath looks the same.

    Comparing SQL and CQL. In SQL table is a set of rows. In simple example it looks like in CQL it is the same, but it is not. CQL table is a set of partitions, where each partition can be just a single row (e.g. when you don't have a clustering key) or multiple rows. Partition containing multiple rows is in Thrift therminology named "wide-row". To see how it is stored underneath, please read e.g. part about composite-keys from here.

    There are more differences:

    I hope I was able to make it a bit more clear for you. I recommend watching some vidoes (or reading slides) from Datastax Core Concepts Course as solid introduction to Cassandra.