relational-algebra

Relational algebra - recode column values


Suppose I have a table 'animals' whose rows represent different animals and there is a column species that might have values like 'cat', 'dog', 'horse', 'cow' etc. Suppose I am only interested in whether the animal is a dog or not. In SQL I am able to make a query like select (species='dog') as isDog from animals to return 1 for dogs and 0 otherwise. How can I express this in relational algebra? It is not selecting because we are not limiting rows. Can I use the project operator even though my expression (species='dog') is not an attribute?

I want to achieve what would result by using the project operator on a column that does not exist but is based on the truth value of an expression. For example consider the table animals containing rows with just one column 'species' having rows for cat, dog, horse, and cow. I need the boolean value that could be renamed to 'isDog' that would result in values 0,1,0, and 0 (1=true, 0=false). I get this information in MySQL by selecting (species='dog') as isDog and I wonder if it is valid RA to use the project operator with (species='dog') to pick such a dynamically created column. Or is there some other way?

(I already have created the SQL queries for the project I am working on. But now I should report the results avoiding SQL code. A more compact mathematical format is OK, so relational algebra should be acceptable. But the animal example is still valid in pointing out my problem.)


Solution

  • TL;DR To introduce specific values into a relational algebra expression you have to have a way to write table literals. Usually the necessary operators are not made explicit, but on the other hand algebra exercises frequently use some kind of notation for example values.

    For your case using the simplest additional relations (with an ad hoc table literal notation similar to SQL VALUES):

        (restrict SPECIES=dog Animals) natural join TABLE{ISDOG}{<1>}
    union (restrict SPECIES<>dog Animals) natural join TABLE{ISDOG}{<0>}
    

    If you want a reference to a relational algebra with an operator for general calculations see author Chris Date's EXTEND operator and his classic textbook An Introduction to Database Systems, 8th Edition.


    More generally: To output a table with rows with new columns with values that are arbitrary functions of values in input rows we have to have access to tables corresponding to operators. Either we use a table literal or we assume a boolean function and use its name as a table name.

    It turns out that the relational model was designed so that:

    A proposition is a statement. A predicate is a statement template parameterized by names. If each base table holds the rows that make a true proposition from a predicate parameterized by its columns then a query holds the rows that make a true proposition from a corresponding predicate parameterized by its columns.

    /* table of rows where
    animal named NAME is AGE years old ... and is of species SPECIES
    */
    Animals
    
    /*
    table of rows where
        animal named NAME is AGE years old ... and is of species SPECIES
    AND if SPECIES=dog then ISDOG=1 ELSE ISDOG=0
    
    ie rows where
        animal named NAME is AGE years old ... and is of species SPECIES
    AND (SPECIES=dog AND ISDOG=1 OR SPECIES<>DOG AND ISDOG=0)
    
    ie rows where
            animal named NAME is AGE years old ... and is of species SPECIES
        AND SPECIES=dog AND ISDOG=1
    OR      animal named NAME is AGE years old ... and is of species SPECIES
        AND SPECIES<>dog AND ISDOG=0
    */
        (restrict SPECIES=dog Animals) natural join TABLE{ISDOG}{<1>}
    union (restrict SPECIES<>dog Animals) natural join TABLE{ISDOG}{<0>}
    

    So you can just use logic, the language of precision in engineering (including software engineering), science (including computer science) and mathematics, to describe your result tables.

    /* table of rows where
        animal named NAME is AGE years old ... and is of species SPECIES
    AND ISDOG=(if SPECIES=dog then 1 else 0)
    */
    

    So you can use table expressions and/or logic expressions in specifications, whichever happens to be clearer, on a (sub)expression-by-(sub)expression basis.

        Animals
    natural join
        table of rows where if SPECIES=dog then ISDOG=1 else 0
    

    (The table corresponding to that IF expression has a row for every string, and the 'dog' row is the only one with a 1.)

    (SQL ON & WHERE have this form of a table on the left and predicate with functions on the right.)

    An algebra expression calculates the rows that satisfy its corresponding logic expression.

    It might not be obvious to you what equivalent relation expressions correspond to what equivalent logic expressions and vice versa. But all that matters is that your clients understand the algebra and/or logic in the specification and your programmers can write an equivalent SQL expression.

    Instead of so natural joining a table representing arguments to a table representing a function, you can

    1. cross join each one-row table holding a function result to the restriction of the argument table to rows that give that result then
    2. union the cross joins.
        (restrict SPECIES=dog Animals) natural join TABLE{ISDOG}{<1>}
    union (restrict SPECIES<>dog Animals) natural join TABLE{ISDOG}{<0>}
    

    See also: Is multiplication allowed in relational algebra?
    And: Relational Algebra rule for column transformation
    Re querying relationally: Relational algebra for banking scenario
    Re understanding the semantics of relational algebra & SQL (plus more links): Is there any rule of thumb to construct SQL query from a human-readable description?

    PS SQL SELECT does things that projection does, but it also does other things that aren't projection, which get done in the algebra by rename, join and/or table literals. What you want isn't projection. It is called EXTEND by author Chris Date. I would advise anyone to use/reference Date's algebra. Although adding RESTRICT/WHERE and EXTEND on arbitrary logic expressions (wffs & terms) begs the question of how one deals with the logic expressions algebraically. This answer explains that/how you can always algebraically express the logic expressions given literal and/or operator tables.