databasecassandradatabase-schemacqlschema-design

Cassandra (CQL) schema/tables look the same as RDBMS for my scenario


I have looked at the Twissandra examples. I asked a similar question regarding this a few days back and received some tips I implemented here. However, by looking at the tables (column families) I see barely any difference between this and a relational database.

My scenario: A simple address book where a user can create his own contacts and group them (one contact can be placed in many groups, one group can contain many contacts). A contact may have multiple addresses for example.

I want to retrieve all the contacts who live in address x and are placed in group y. Therefore, I did the following:

CREATE TABLE if not exists User (user_id uuid, contact_id uuid, type varchar, email varchar, PRIMARY KEY(id));
CREATE TABLE if not exists Contact (contact_id uuid, firstname varchar,lastname varchar, photo blob, imagelength int, note varchar, PRIMARY KEY (id));
CREATE TABLE if not exists Address (address_id uuid, contact_id uuid, street varchar, number int, zipcode varchar, country varchar, PRIMARY KEY(address_id));
CREATE TABLE if not exists Group (group_id uuid, user_id, groupname varchar, PRIMARY KEY(group_id));
CREATE TABLE if not exists Group_Contact (group_id uuid, contact_id, PRIMARY KEY(id, contact_id));

However, based on this, this is literally exactly the same as a relational database, well, except that I believe Cassandra is putting this data in a different way than a RDBMS on disk. I don't see how this can be made better in Cassandra and whether I even modeled this the right way. It just feels as a plain relational database. I feel that I did something wrong since I have to use application level joins to get the address of the contacts. I really don't know how I can de-normalize this to allow multiple addresses (and maybe even phones, emails).

Any suggestions to improve this scenario would be greatly appreciated!


Solution

  • As jny indicated, data duplication, denormalization and query-based modeling are keys to building good Cassandra data models. If I wanted to take your tables above, and build a table to support address/contact queries based-on country, I could do it like this:

    First, I'll create a user defined type for the contact's address.

    aploetz@cqlsh:stackoverflow> CREATE TYPE contactAddress (
                 ...   street varchar, 
                 ...   city varchar,
                 ...   zip_code varchar,
                 ...   country varchar);
    

    Next, I'll create a table called UserContactsByCountry to store user contact info, as well as any user contact addresses:

    aploetz@cqlsh:stackoverflow> CREATE TABLE UserContactsByCountry (
                 ...   country varchar,
                 ...   user_id uuid,
                 ...   type varchar,
                 ...   email varchar,
                 ...   firstname varchar,
                 ...   lastname varchar,
                 ...   photo blob,
                 ...   imagelength int,
                 ...   note varchar,
                 ...   addresses map<text, frozen <contactAddress>>,
                 ...   PRIMARY KEY ((country),user_id));
    

    A couple of things to note here:

    Next, I'll insert three user contacts, each with two addresses, two from the USA and one from Great Britain.

    aploetz@cqlsh:stackoverflow> INSERT INTO usercontactsbycountry (country, user_id, type, email, firstname, lastname, note, addresses)
    VALUES ('USA',uuid(),'Tech','brycelynch@network23.com','Bryce','Lynch','Head of R&D at Network 23',{'work':{street:'101 Big Network Drive',city:'New York',zip_code:'10023',country:'USA'},'home':{street:'8192 N. 42nd St.',city:'New York',zip_code:'10025',country:'USA'}});
    aploetz@cqlsh:stackoverflow> INSERT INTO usercontactsbycountry (country, user_id, type, email, firstname, lastname, note, addresses)
    VALUES ('USA',uuid(),'Reporter','edisoncarter@network23.com','Edison','Carter','Reporter at Network 23',{'work':{street:'101 Big Network Drive',city:'New York',zip_code:'10023',country:'USA'},'home':{street:'76534 N. 62nd St.',city:'New York',zip_code:'10024',country:'USA'}});
    aploetz@cqlsh:stackoverflow> INSERT INTO usercontactsbycountry (country, user_id, type, email, firstname, lastname, note, addresses)
    VALUES ('GBR',uuid(),'Reporter','theorajones@network23.com','Theora','Jones','Controller at Network 23',{'work':{street:'101 Big Network Drive',city:'New York',zip_code:'10023',country:'USA'},'home':{street:'821 Wembley St.',city:'London',zip_code:'W11 2BQ',country:'GBR'}});
    

    Now I can query that table for all user contacts in the USA:

    aploetz@cqlsh:stackoverflow> SELECT * FROM usercontactsbycountry WHERE country ='USA';
     country | user_id                              | addresses                                                                                                                                                                                    | email                      | firstname | imagelength | lastname | note                      | photo | type
    ---------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+-------------+----------+---------------------------+-------+----------
         USA | 2dee94e2-4887-4988-8cf5-9aee5fd0ea1e |  {'home': {street: '8192 N. 42nd St.', city: 'New York', zip_code: '10025', country: 'USA'}, 'work': {street: '101 Big Network Drive', city: 'New York', zip_code: '10023', country: 'USA'}} |   brycelynch@network23.com |     Bryce |        null |    Lynch | Head of R&D at Network 23 |  null |     Tech
         USA | b92612dd-dbaa-42f2-8ff2-d36b6c601aeb | {'home': {street: '76534 N. 62nd St.', city: 'New York', zip_code: '10024', country: 'USA'}, 'work': {street: '101 Big Network Drive', city: 'New York', zip_code: '10023', country: 'USA'}} | edisoncarter@network23.com |    Edison |        null |   Carter |    Reporter at Network 23 |  null | Reporter
    
    (2 rows)
    

    There are probably other ways in which this could be modeled, but this is one that I hoped to use to help you understand some of the techniques available.