javajpadatabase-designsurrogate-key

join between two many-to-many tables


I have some trouble defining appropriate database design for my java based web application. I have 3 tables(Tag, DT and Property) and my scenario is as follow. Each Tag can contain many Dts and each DT can be assigned to different Tags, each DT has many Properties and each Property can be used in many DTs. When a DT is assigned to a Tag, then the user can set values for all its properties.

My question is that how can I define a relation such that each Tag can have its properties' values base on the DT that is assigned to it. what are missing tables and relations in my design. I have to create relative Entities based on this db then.

Tag1==> DT1 ==> Initial values1, property values set 1 ; 
Tag2==> DT1 ==> Initial values1, property values set 2 ;

Here is the db diagram. And thanks for your help in advance.

enter image description here

Here is some sample data

enter image description here


Solution

  • The central issue with your design, as I see it, is the use of the surrogate keys. It is not necessary to always create numeric single-column keys for tables. And even if you do so, this doesn't guarantees you'll be free of duplicates. This, in fact, forces system to keep more indexes on your tables, which is an extra job to be done.

    1. Some concepts:

      • I use singular for table names, 'cos each tuple in the table represents one object of the relation;
      • I use small letters for table and column names (identifiers) and big letters for the keywords. I don't like CamelCase in databases;
      • I use <table_name>_id for PK columns. It's also possible due to singular table names;
      • I use prefix + table name + details for all constraints and indexes I create.

      You don't have to follow them, but it'd be very nice to stick to a consistent naming patterns for your design.

    2. I would start off with property_type dictionary:

      CREATE TABLE property_type (
          property_type   varchar(20) NOT NULL,
          CONSTRAINT p_property_type PRIMARY KEY (property_type)
      );
      

      It is a one-column table, that exists only to provide a domain of possible values for the properties types. I used varchar(20), textual column is pretty fine. A benefit of this — you don't have to join back to this table via numeric keys to get what property_type_id=123 means.

    3. Properties:

      CREATE TABLE property (
          property_id     integer     NOT NULL,
          property_name   varchar(50) NOT NULL,
          property_type   varchar(20) NOT NULL,
          CONSTRAINT p_property PRIMARY KEY (property_id),
          CONSTRAINT u_property_name UNIQUE (property_name),
          CONSTRAINT f_property_type FOREIGN KEY (property_type)
            REFERENCES property_type ON UPDATE CASCADE
      );
      

      I decided to go with a numeric PK here, for I suppose you might want to rename properties at some point. And should you change property_type, updates will be cascaded.

      Despite the fact there's a PK here already, UNIQUE constraint on names is a must here, otherwise you risk to end up in a situation where you have equally named properties with different IDs.

    4. DT table:

      CREATE TABLE dt (
          dt_id       integer     NOT NULL,
          dt_name     varchar(50) NOT NULL,
          CONSTRAINT p_dt PRIMARY KEY (dt_id),
          CONSTRAINT u_dt_name UNIQUE (dt_name)
      );
      

      Again, having just PK is not enough, creating also UNIQUE constraint. Though I would very much got rid of dt_id completely and would keep only dt_name and would have made it a PK.

    5. Properties for the DT:

      CREATE TABLE dt_property (
          dt_id           integer     NOT NULL,
          property_id     integer     NOT NULL,
          initial_value   varchar(50) NOT NULL,
          CONSTRAINT p_dt_property PRIMARY KEY (dt_id, property_id),
          CONSTRAINT f_dt_id FOREIGN KEY (dt_id) REFERENCES dt,
          CONSTRAINT f_property_id FOREIGN KEY (property_id) REFERENCES property
      );
      

      This is the first big difference from your design — composite key is used.

      Yes, it means you'll have to carry out 2 columns whenever you want to refer to the entry in this table. But this is not a big deal, really — you design table once, you write your queries also once, but your software might be used for quite some time if it's done properly and easily maintained. It's better to spend a bit more time writing queries and get easily maintained system in the long term.

    6. Tags:

      CREATE TABLE tag (
          tag_id      integer     NOT NULL,
          tag_name    varchar(50) NOT NULL,
          CONSTRAINT p_tag PRIMARY KEY (tag_id),
          CONSTRAINT u_tag_name UNIQUE (tag_name)
      );
      

      This is just another dictionary. Again, like for dt table, I would really want to avoid using tag_id column and keep just tag_name, making it also a PK.

    7. New table tag_dt introduced:

      CREATE TABLE tag_dt (
          tag_id      integer     NOT NULL,
          dt_id       integer     NOT NULL,
          CONSTRAINT p_tag_dt PRIMARY KEY (tag_id, dt_id),
          CONSTRAINT f_tag_id FOREIGN KEY (tag_id) REFERENCES tag,
          CONSTRAINT f_dt_id FOREIGN KEY (dt_id) REFERENCES dt
      );
      

      This table is required to create dt + tag relations. Without it, you have data duplications — you can see it on your schema, you have 2 rows with Tag_name='Tag1'.

    8. Finally, tag properties:

      CREATE TABLE tag_property (
          tag_id      integer     NOT NULL,
          dt_id       integer     NOT NULL,
          property_id integer     NOT NULL,
          a_value     varchar(50) NOT NULL,
          CONSTRAINT p_tag_property PRIMARY KEY (tag_id, dt_id),
          CONSTRAINT u_tag_property UNIQUE (tag_id, property_id),
          CONSTRAINT f_tag_property_tag FOREIGN KEY (tag_id, dt_id) REFERENCES tag_dt,
          CONSTRAINT f_tag_property_property FOREIGN KEY (dt_id, property_id)
            REFERENCES dt_property
      );
      

      This table is full composite keys and it conforms to all your requirements. Primary key is tag_id, dt_id and it is also a Foreign key to the tag_dt table, so you want be able to introduce something that hadn't been defined previously. Next, tag_id, property_id is unique, meaning tag's properties cannot duplicate. Finally, dt_id, property_id references dt_property table, which means that only properties allowed for that dt will be registered.

    Final notes

    All Primary and Unique keys are implemented via indexes in most DBMSes. Also, some DBMSes can use composite indexes (multi-column ones) even if first column of the key is omitted. At least PostgreSQL can do it, this is what I use most.

    Please, review your dt and tag tables, I highly recommend getting rid of the surrogate keys in those, just like property_type is done.

    I haven't created any extra indexes, typically I take this exercise after Data Model is implemented and some real queries are done against it.

    Also, don't use value or name for column names. These are reserved words and you might have unexpected effects in the future incarnations of your DBMS of choice.