I am working on a database which has some types (e.g. User
, Appointment
, Task
etc.) which can have zero or more Notes
associated with each type.
The possible solutions I have come across for implementing these relationships are:
Suggested by many as being the easiest solution to implement and seemingly the most common implementation for frameworks that follow the Active Record pattern, I would add a table whose data is morphable
:
My notable_type
would allow me to distinguish between the type (User
, Appointment
, Task
) the Note
relates to, whilst the notable_id
would allow me to obtain the individual type
record in the related type
table.
PROS:
CONS
Alternatively I could create a table for each type which is responsible for the Notes
associated with that type only. The type_id
foreign key would allow me to quickly obtain the individual type
record.
Deemed by many online as a code smell, many articles advocate avoiding the polymorphic relationship in favour of an alternative (here and here for example).
PROS:
CONS:
type_notes
tableThe polymorphic relationship is certainly the simpler of the two options to implement, but the lack of foreign key constraints and therefore potential for consistency issues feels wrong.
A table per notes
relationship (user_notes
, task_notes
etc.) with foreign keys seems the correct way (in keeping with design patterns) but could result in a lot of tables (addition of other types
that can have notes
or addition of types similar to notes
[e.g. events
]).
It feels like my choice is either simplified table structure but forgo foreign keys and increased query overhead, or increase the number of tables with the same structure but simplify queries and allow for foreign keys.
Given my scenario which of the above would be more appropriate, or is there an alternative I should consider?
What is "table bloat"? Are you concerned about having too many tables? Many real-world databases I've worked on have between 100 and 200 tables, because that's what it takes.
If you're concerned with adding multiple tables, then why do you have separate tables for User
, Appointment
, and Task
? If you had a multi-valued attribute for User
, for example for multiple phone numbers per user, would you create a separate table for phones, or would you try to combine them all into the user table somehow? Or have a polymorphic "things that belong to other things" table for user phones, appointment invitees, and task milestones?
Answer: No, you'd create a Phone
table, and use it to reference only the User
table. If Appointments have invitees, that gets its own table (probably a many-to-many between appointments and users). If tasks have milestones, that gets its own table too.
The correct thing to do is to model your database tables like you would model object types in your application. You might like to read a book like SQL and Relational Theory: How to Write Accurate SQL Code 3rd Edition by C. J. Date to learn more about how tables are analogous to types.
You already know instinctively that the fact that you can't create a foreign key is a red flag. A foreign key must reference exactly one parent table. This should be a clue that it's not valid relational database design to make a polymorphic foreign key. Once you start thinking of tables and their attributes as concrete types (like described in SQL and Relational Theory), this will become obvious.
If you must create one notes table, you could make it reference one table called "Notable" which is like a superclass of User
, Appointment
, and Task
. Then each of those three tables would also reference a primary key of Notable
. This mimics the object-oriented structure of polymorphism, where you can have a class Note have a reference to an object by its superclass type.
But IMHO, that's more complex than it needs to be. I would just create separate tables for UserNotes
, AppointmentNotes
, and TaskNotes
. I'm not troubled by having three more tables, and it makes your code more clear and maintainable.