databasetheoryrelational-databaserelationaldatabase-theory

The Relational Model & Queries That Naturally Return Duplicate Rows


It's commonly understood that in the relational model:

  1. Every relational operation should yield a relation.
  2. Relations, being sets, cannot contain duplicate rows.

Imagine a 'USERS' relation that contains the following data.

ID FIRST_NAME LAST_NAME
 1 Mark       Stone
 2 Jane       Stone
 3 Michael    Stone

If someone runs a query select LAST_NAME from USERS, a typical database will return:

LAST_NAME
Stone
Stone
Stone

Since this is not a relation - because it contains duplicate rows - what should an ideal RDBMS return?


Solution

  • "But some information is lost - that there are 3 users with that last name."

    If the count of users with that name is what you are interested in, then the query of your example is not the question you should be asking.

    The query of your example will provide the answer to the question "What are all the last names such that there exists a user that has that last name?".

    If the question you want to ask is "How many users are there that are named 'Stone'", then the query you should submit is Select count(...) from users where last_name = 'Stone';

    Projection always "loses" information: the information that is tied to the attributes that are projected away. I don't see how a known property of a useful relational operator can be explained as an argument against that operator.