database functional-dependencies candidate-key

Determining candidate keys from functional dependencies: Exceptions where we don't use the minimal super key?

I'm currently trying to understand functional dependencies and how to derive can candidate keys from them. In an assignment I was given the following relation R

Lecture	Room	Date	NStudents	Lecturer	Tool
Graphical DP	209	Wed. 1	53	0210	Overhead-Pr.
Graphical DP	209	Fr. 3	53	0210	Overhead-Pr.
C	413	Tue. 1	86	0111	PC
C	413	Tue. 1	86	0111	Overhead-Pr.
C	413	Wed. 3	86	0111	PC
C	413	Wed. 3	86	0111	Overhead-Pr.
Mathematics	418	Mo. 1	76	0342	Board
Mathematics	318	Thur. 2	76	0342	Board
Data Structures	310	Fr. 2	32	0550	Board
Data Structures	310	Fr. 2	32	0550	Overhead-Pr.

The first task was to find all meaningful functional dependencies in the relation R.

Functional dependency is defined in our lecture as

For a relation R with arbitrary attribute sets X and Y, Y is functionally dependent on X (we also say X determines Y) if for all (x1,y1) and all (x2,y2) with x1,x2∈X and y1,y2∈Y holds: x1=x2 ⇒ y1=y2, i.e., any same combination of values in X must condition the same combination of values in Y.

Using this definition I've identified the following set of functional dependencies

FD = {Lecture -> (NStudents, Lecturer), (Room, Date) -> (Lecture, Lecturer), (Lecture, Date) -> (Room, Lecturer), (Date, Lecturer) -> Lecture, (Date, Tool) -> Lecture}

However, I don't know if they are meaningful because it wasn't specified what meaningful exactly means.

The next task is to identify candidate keys.

In our lecture a super key is defined as

Let A be the set of all attributes of a relation R and let K be an arbitrary attribute set of R.

K is called a superkey if K->A

Fully functional dependency is defined in our lecture as

A functional dependency X -> Y is a full functional dependency if there is no Z⊂X for which Z->Y holds, otherwise X->Y is a partial functional dependency.

Thus, a candidate key is defined as

If K->A is a full functional dependency, then K is called a candidate key of R

Next, I've constructed attribute closure sets to find a set of attributes which satisfy the definition of the candidate keys

(Lecture)+ = {NStudents, Lecturer}
(Room, Date)+ = {Room, Date, Lecture, Lecturer}

As NStudents can be derived from Lecture I add the attribute to the closure set (Room, Date)+ thus

(Room, Date)+ = {Room, Date, Lecture, NStudents, Lecturer}
(Lecture, Date)+ = {Lecture, Date, Room, Lecturer}

as NStudents can be derived from Lecture, I add NStudents to the attribute closure set, again

(Lecture, Date)+ = {Lecture, Date, Room, Lecturer, NStudents}
(Date, Lecturer)+ = {Date, Lecturer, Lecture}

as NStudents can be derived from Lecture and Room from Lecture and Date I add those attributes to the closure set

(Date,Lecturer)+ = {Date, Lecturer, Lecture, NStudents, Room}
(Date, Tool) + = {Date, Tool, Lecture}

NStudents and Lecturer can be derived from Lecture, Room can be derived from Date and Lecturer or Date and Lecture, so I add those attributes to the closure set

(Date, Tool) + = {Date, Tool, Lecture, NStudents, Lecturer, Room}

If I'm not missing anything, the attribute set {Date, Tool} is a fully functional dependency as there is no Z⊂X for which Z->Y.

However, the solution to the exercise are the following attribute sets:

{Lecture, Tool, Date}
{Room, Tool, Date}
{Lecturer, Tool, Date}

I'm wondering why another attribute is added? Doesn't this step make the attribute set just a partially functional dependency?

Solution

Given a certain model of reality through a relation, the Functional Dependencies holding in it arise from the meaning of the data represented. A sample data set, as in your example, can only give vague suggestions of such meaning, but cannot express completely it.

For instance, from your example one could conclude that Lecturer -> Lecture (and viceversa), but from the data alone we cannot be sure that same Lecturer will give two different Lecture at a certain time (or viceversa). Another example of inferred dependency could be Room -> Lecture, since for each room there is a unique, different lecture.

So, the correct way of giving a set of functional dependencies is to know the meaning of the data and express the dependency among them with this notation.

Note, moreover, that different sets of functional dependencies can be equivalent (technically they are called a “cover” of the dependencies holding in a relation), describing the same meaning in different ways.

So, when it is required to give a set of FD holding in a certain situation, we try to give a minimal set of dependencies, since there are methods the allows us to find all the derived dependencies from them, for instance trough the so-called Armstrong’s Axioms.

When an exercise is proposed, like in your case, in which we should find the “meaningful functional dependencies”, the only thing that we can do is in part look at the data and in part try to understand the meaning of the data, to find a set of FD which is not too much redundant, yet that captures as much as possible the meaning of the data. And in general minimum means mostly two things: a) with the smallest possible set of attributes, and b) with dependencies that cannot be derived from others.

For instance, from your data one could produce a set of this kind:

Lecture -> Lecturer
Lecturer -> Lecture
Room -> Lecture
Lecture -> NStudents
NStudents -> Lecture
Tool, Date -> Room

Is almost impossible to know if the set is complete and is true in the general case (for instance, there could be lectures with the same number of students, but with different students, and only by chance the number of students is equal), although we can think that this is “reasonable” given the data shown.

In this case, assuming that the FD above are a cover of the dependences of the relation (all the other FD could be derived from the above set), you are correct in considering that the only candidate key is (Tool, Date).

But maybe in your teacher’s mind there are other data not present (or other information about the model) that contradicts this set of FD, so the answer to the exercise is different from the inferences that we can do from the data.