sqldatabase-integritylinktable

Your Opinion? - linking tables by LinkTypeId


I have a web app that links files and notes to customers, users, projects, etc. Initially, I had tables such as customerNote, userNote, projectNote...etc.

Design considerations:
1: I don't want to manage N squared tables (over 100 customerNote, userNote, projectNote,... customerFile, projectFile...etc tables)
2: I don't want to use dynamic SQL (TableName from LinkType)
3: I don't see a clean way to use linktables (without 100+ N squared linktables)

Now, I have one Note table that has LinkId and LinkTypeId. LinkId of course, is the PK of the client|User|Project|etc tables; and LinkTypeId points to the type of link.

So this:
SELECT * FROM customerNote WHERE Id = 1210
SELECT * FROM userNote WHERE Id = 3281

Has now become this:
SELECT * FROM Note WHERE LinkId = 1210 AND LinkTypeId = 2 (2 being customer)
SELECT * FROM Note WHERE LinkId = 3281 AND LinkTypeId = 3 (3 being user)

I enjoy the simplicity of this approach, and I have wrapped them into functions that I call all over the place.

My questions are:
1: Without referential integrity what performance or other issues would I have?
2: Does this cause scalability problems?
3: Is there an elegant solution?

This is my first SO post, and I thank you all in advance for your help.


Solution

  • 1: Without referential integrity what performance or other issues would I have?

    You'll potentially have all the problems referential integrity is intended to eliminate. You'll have to either live with those problems, or implement an imitation of all the referential integrity constraints in application code or through administrative procedures (like reports).

    You'll also greatly increase the size of your "notes" table. 100 thousand rows in each of 100 tables of free-form notes is pretty manageable. But 10 million rows in one table of notes might make you reconsider whether life is worth living.

    Implementing an imitation of integrity constraints in application code means that, sooner or later, somebody (probably you) will side-step the application, and change rows through the dbms command-line client or gui client. You can do a lot of damage that way. The sane thing to do is to checkpoint or dump the database before you take such a risk, but 10 million rows of notes makes it less likely that you'll do that.

    2: Does this cause scalability problems?

    It can. If you have 100 separate notes tables, each of them can grow to 100,000 rows and still be fast to query. Put them all in one table, and now you've got 10 million rows. And because they're notes, fewer will fit on a page. That usually means slower speed. With this design, the single table of notes becomes a cold spot (or hot spot, depending on how you look at it), slowing every table that uses notes, not just one or two heavily annotated tables.

    And after living with slower speed on all tables for a couple of months, you're likely to split that monster table up into the original tables again.

    3: Is there an elegant solution?

    If every note is supposed to have the same maximum length--a pretty unlikely requirement for 100 notes tables--then create a domain for the notes, and create one table of notes for each annotated table.

    create domain note_text as varchar(1000) not null;
    create table user_notes (
      user_id integer not null references users (user_id) on delete cascade,
      note_timestamp timestamp not null default current_timestamp,
      user_note note_text,
      primary key (user_id, note_timestamp)
    );
    

    As an aside, you need to be really careful allowing users to annotate rows. They will often (usually?) use note columns in lieu of putting data where it belongs. For example, if you have a table of users' phone numbers, then a note column will almost certainly end up with data line this.

    Call 123-456-7890 between 8:00 am and 5:00 pm. (And that will match 
       none of this user's phone numbers.)
    Toll-free orders at 1-800-123-4567.
    He eats lunch at McDonald's on Tuesdays.