databaserelational-databaseprimary-keyrdbmscandidate-key

How do primary keys work in junction tables for a DBMS? How can a composite key be a primary key?


In a DBMS we have

  1. Superkey - An attribute or a set of attributes that uniquely identifies a row in a table.
  2. Candidate Key - An attribute or set of attributes that uniquely identify identifies a row in a table. The difference between the superkey and a candidate key is no subset of a candidate key can itself be a candidate key.
  3. Primary key - A chosen candidate key that become the attribute to uniquely identify a row.

If we want to identify a many to many relation between two tables we can define a junction table such as:

Tables:

Author(AuthorID, FirstName, LastName) -- AuthorID is primary key
Book(BookID, BookTitle) -- BookID is primary key

To create the relation between both:

AuthorBook(authorID, BookID) -- together authorID and BookID are primary key

I am thinking bookID and authorID are both primary keys in their own respect.

Since a candidate key (and therefore a primary key) must not have a subset containing a candidate key, how can authorID plus BookID be a primary key? This seems to break the definition of a primary key.

I understand this may be the difference between real world an theory but as the DBMS textbooks I have read seem to define junction tables this way and define primary keys this way it seems like there is a disconnect there.

Am I misunderstanding this concept?


Solution

  • When we use one of those terms we have to be talking about a given table (variable, value or expression). The superkeys, CKs & PKs of a table are not determined by roles its attributes play in other tables. They are determined by what valid values can arise for the table under the given business rules.

    Superkey - An attribute or a set of attributes that uniquely identifies a row in a database.

    A superkey of a given table can be defined as a set of attributes that "uniquely identifies a row" of the table. (Not database.) Although that quoted phrase is a kind of shorthand that isn't a very clear description if you don't already know what it means.

    A superkey of a given table can be defined as a set of attributes whose subrow values can only appear once in the table. Or as a set of attributes that functionally determines every set of attributes in the table.

    When a superkey has just one attribute we can sloppily talk about that attribute being a superkey.

    Candidate Key - An attribute or set of attributes that uniquely identifies a row in a database.

    It's true that every CK (candidate key) of a certain table is a superkey of that table. But you mean that a set of attributes is by definition a superkey when/iff that and some other condition(s) hold. But you don't clearly say that when you write this section.

    The difference between the superkey and a candidate key is no subset of a candidate key can itself be a candidate key.

    No. A set is a subset of itself so a CK is a subset of itself so a CK always has a subset that is a CK--itself. What you mean is, no proper/smaller subset. Then your statement is true. But also true and more important is that no proper/smaller subset of a CK is a superkey.

    You don't actually define "CK" in this paragraph. A CK of a given table can be defined as a superkey of that table that contains no proper/smaller subset that is a superkey of that table.

    Primary key - A chosen candidate key that becomes the attribute to uniquely identify a row.

    No. The PK (primary key) of a given table is defined as the one CK of that table that you decided to call the PK. (Not attribute.)

    Note that CKs & PKs are superkeys. PKs don't matter to relational theory.

    To create the relation between both:

    AuthorBook(authorID, BookID) -- together authorID and BookID are primary key
    

    What the superkeys & CKs are & so what the PK can be is determined by the FDs (functional dependencies) that hold in the table. But if you are presuming that this is a many to many table then it takes an authorID-bookID pair to uniquely identify a row, so there can only be one CK, {authorID, bookID}. So that is the only possible PK. So {authorID} & {bookID} cannot be superkeys or CKs or PKs.

    You can see this by looking at examples & applying the definitions.

    authorID bookID
          1      a
          1      b
    

    Here authorID does not uniquely identify a row. So it can't be a superkey. So it can't be a CK. So it can't be a PK.

    textbooks I have read seem to define junction tables this way and define primary keys this way

    No, they don't.

    However they do say that certain sets of attributes & subsets of superkeys, CKs & PKs in the junction table are FKs (foreign keys) in the junction table referencing those other tables where they are CKs (which might be PKs) of/in those other tables.

    A FK of a given table can be defined as a certain set of attributes in the table whose subrow values must appear as certain CK subrows in a certain other table.

    But since you say this is a junction table, presumably {authorID} is a FK to an author table where its values appear under a CK/PK & {bookID} is a FK to a book table where its values appear under a CK/PK. So FK {authorID} in AuthorBook referencing {authorID} in Author & FK {bookID} in AuthorBook referencing {bookID} in Book.

    PS PK & other terms mean something else in SQL. A declared SQL PK can have a smaller SQL UNIQUE declared within it. SQL "uniqueness" itself is defined in terms of SQL NULL. It's reasonable to say that an SQL PK is more reminiscent a relational superkey than it is reminiscent of a relational PK. Similarly a SQL FK is more reminiscent of what we could reasonably call a relational foreign superkey than a relational foreign key.