databaserelational-databasethird-normal-form

3rd normal form with unique values


I have table USER with schema USER(user_id, email, first_name, last_name, ...). Email is a unique value in my database, user_id is a primary key. So, user_id and email are candidate keys. Does it mean that I have a transitive dependency here (user_id -> email -> (first_name, last_name, ...)) and thus the database is not in NF3?


Solution

  • Noting the Relational Database tag.

    The main problem, that confuses everything, is that there is one Relational Model, and then various nonsense that is promoted by others as "relational".

    1. The one and only original Relational Model by Dr E F Codd.
    1. The Date & Darwen & Fagin version, which is the Record Filing Systems of the 1960's, pre-database as well as pre-Relational: the very thing that the RM replaces. Plus a couple of fragments (not whole concepts) from the Relational Model. Promoted and marketed heavily as "relational", which is not correct.

    The Relational Model is freely available (in those days, only qualified people attempted database design). However, the terms are dated and thus not understood today. And it is seminal, dense. The proponents count on that, in order to promote their approach as "relational".


    The Relational Model

    I will answer for the Relational Model. In order to avoid going over the same item more than once, I will take the issues in logical sequence.

    1. The RM requires that each row (not record, because it is beyond records) is unique.

    2. In the RM, one or more Unique Keys are selected for each table. Each Key must be "made up from the data". Further, each Key must be non-redundant, the minimal Key. If there is more than one Key, one is chosen as Primary Key, which is migrated to the subordinate rows as a Foreign Key. Once the Primary Key is chosen, any Keys that remain are Alternate Keys.

    While the Keys may be loosely called "candidates" before the election, after the election they are no longer "candidates", they are losers.

    • The use of the term "candidate" serves only to (a) maintain the tension of not choosing a Primary Key from one of the "candidates" (as required by the RM), and (b) thus allow a non-Key (such as a fabricated Record ID or user_id as a PRIMARY KEY.

    • A Record ID field does not exist in the data, it is manufactured by the system (GUID; AUTO INCREMENT; etc). Such a field is great for perceiving the data in physical terms (RFS), as if it were a stupefying grid, and therefore suppressing the perception of data as logical components the are related (the RM).

    • A genuine Key has many important properties. Declaring a non-Key to be a PRIMARY KEY, which is possible in SQL, does not magically give the non-Key any of those properties.

    1. You have recognised that email is an unique identifier. Great. In fact, for that schema, it is the only Key, so you do not have to concern yourself with choosing one from many possibles.

    Here is a comparison.

    BakTA


    Now you want to check if the table satisfies 3NF. Dmitry's intention is correct, although his definition might not be. Dr E F Codd's 3NF, not the pretenders:


    1960's Record Filing System

    Sorry, I can't help you there, because it is an ever-changing and unreliable mess that deals with fragments of data instead of recognising the atoms, and they will argue endlessly, resolving nothing. They work on notions that are not in the RM, and therefore fraudulently called "relational", while retaining the essential parts of the 1960's Record-oriented paradigm: the additional field and additional index for the physical Record ID; and the reference by physical Record ID. Both of which are prohibited by the RM.


    Comments

    This section is presented in accordance with the SO guidelines, specifically: to correct misinformation whenever you see it. I did respond to the comments, but they keep disappearing. Thus I have placed it here.

    philipxy:
    Codd's 1971 [paper] "Further normalization of the data base relational model" introduced "normalization" in the sense of decomposition to higher NFs, including "FD", "CK", "one of its CKs is arbitrarily designated as PK", "2NF" in terms of "partial/full FDs" & "3NF" in terms of "transitive/non-transitive FDs".

    1. That quote is from the 1971 paper, not the Jun 1970 paper The Relational Model. They are two different papers. Therefore it is confirmed:

    as such, the 1971 paper can be dismissed.

    1. As evidenced, Codd wrote about twelve additional papers during the decade (1970 to 1980) that he was trying to get the RM accepted. They have no value except for historical purposes, to examine the way he responded to the upheaval of the DBMS platform suppliers, that was caused by the RM: it was a paradigm shift.

    The 1971 paper, and the "RM/Tasmania" articles and presentations, have the explicit purpose of assisting the then entrenched DBMS platform users to implement a bit of relational capability into their systems without changing the platform, the reference-by-physical-pointer paradigm (mindset & implementation).

    After the Relational Model became accepted, around 1985, when all the DBMS platform suppliers started switching to supplying RDBMS platforms, ie. the reference-by-physical-pointer platforms became extinct, the 1971 paper, which was previously near and dear to them, became obsolete.

    Therefore, again, the 1971 paper can be dismissed.

    1. The only article (not a formal paper, but widely accepted as one) that is relevant to the RM is the 1985 ComputerWorld article commonly known as Codd's Twelve Rules, which gives the rules for both (a) a DBMS Platform, and (b) a database, to be accepted as genuinely Relational. That was to overcome the problem of DBMS platform suppliers adding Relational bits and pieces and then labelling their product "relational".

    2. Still confused? A scientific person would:

    1. The Date, Darwen, Fagin, et al, and all their followers (authors, professors, lecturers, etc), are not stupid. Therefore the following evidence acts:

    are incorrect, and are in disagreement with the original relational model.

    Institutionalised Suppression of the Relational Model

    It is reinforced by additional acts:

    Erwin Snout:
    When cut down to its bare essence, the relational model of data has no more than exactly one "rule" : all information in the database must be represented as values of attributes in tuples in relations.

    There are over 40 rules in the Relational Model and the Twelve Rules. Reducing them to one pithy rule is not compatible with the RM.