sqlpostgresqlplpgsqlsql-optimization

Avoid checks for each row, replace query by function?


I have teams:

create table team (
    id      integer     primary key,
    type    text
);

Also, I have players:

create table player
(
    id      integer     primary key,
    age     integer,
    team_id integer     references team(id)
);

The type of a team can either be 'YOUTH' or 'ADULT'. In a youth team, only players starting from age 16 are allowed in official games. In an adult team, only players starting from age 18 are allowed in official games.

Given a team identifier, I want to find all allowed players for the upcoming game. I have the following query:

select    player.*
from      player
join      team
on        player.team_id = team.id
where     team.id = 1 and
          (
              (team.type = 'YOUTH' and player.age >= 16) or
              (team.type = 'ADULT' and player.age >= 18)
          );

This works. However, in this query, for every player I am repetitively checking the type of the team. This value will stay the same during the whole query.

Is there a way to improve this query? Should I replace it with a pgplsql function where I first store the team into a local variable, and then discriminate with following flow?

IF team.type = 'YOUTH' THEN <youth query> ELSE <adult query> END IF

For me, that feels like killing a fly with a bazooka, but I don't see an alternative right now.

I created a SQL fiddle: http://rextester.com/TPFA20157


Solution

  • Auxiliary table

    In (strict relational) theory, you would have another table storing attributes of team types like the minimum age.

    NEVER store "age", though, which is a function of the underlying constant birthday and the current time. Always store the birthday. Could look like this:

    CREATE TABLE team_type (
       team_type text PRIMARY KEY
     , min_age   int NOT NULL  -- in years
    );
    
    CREATE TABLE team (
       team_id   integer PRIMARY KEY
     , team_type text NOT NULL REFERENCES team_type
    );
    
    CREATE TABLE player (
       player_id serial  PRIMARY KEY
     , birthday  date NOT NULL   -- NEVER store "age", it's outdated the next day
     , team_id   integer REFERENCES team
    );
    

    Query:

    SELECT p.*, age(now(), p.birthday) AS current_age
    FROM   player    p
    JOIN   team      t  USING (team_id)
    JOIN   team_type tt USING (team_type)
    WHERE  t.team_id = 1
    AND    p.birthday <= now() - interval '1 year' * tt.min_age;
    

    Using the function age() to display current age, which fits the conventional algorithm to determine age.

    But using the more efficient expression p.birthday <= now() - interval '1 year' * tt.min_age in the WHERE clause.

    Aside: the current date depends on the current time zone, so the result can vary +/- 12 hours, depending on the time zone setting of the session. Details:

    Alternative: function

    But yes, you could replace the table team_type with the logic encapsulated in a function like this:

    CREATE FUNCTION f_bday_for_team_type(text)
      RETURNS date
      LANGUAGE sql STABLE AS
    $func$
    SELECT (now() - interval '1 year' * CASE $1 WHEN 'YOUTH' THEN 16
                                                WHEN 'ADULT' THEN 18 END)::date
    $func$;
    

    Calculating the maximum birthday to fulfill the minimum age for the given team type. The function is STABLE (not VOLATILE) as one might assume. The manual:

    Also note that the current_timestamp family of functions qualify as stable, since their values do not change within a transaction.

    Query:

    SELECT p.*, age(now(), p.birthday) AS current_age
    FROM   player p
    JOIN   team   t USING (team_id)
         , f_bday_for_team_type(t.team_type) AS max_bday  -- implicit CROSS JOIN LATERAL
    WHERE  t.team_id = 2
    AND    p.birthday <= max_bday;
    

    Not the holy grail of relational theory, but it works.

    fiddle