phpmysqlindexingdatabase-designdatabase-performance

Should relational tables contain duplicate data to speed up queries


I have a MySQL database with 4 tables:

job
job_application
client
candidate

Each table has it's own primary key, i.e job_id, job_application_id, client_id, candidate_id

Employers in the client table can post jobs in the job table. The job table contains a client_id field which identifies the client

Candidates in the candidate table can apply for a job, inserting a row in to the job_application table. The job_application table contains a job_id field and a candidate_id field to identify what the job is and who applied for it

I've run in to a bit of a problem writing up the queries for Employers to manage the job applications they receive. As an example here is a function I wrote that deletes rows from job_application

public function deleteJobApplications($job_application_ids) {
    $this->db->query("DELETE ja.* FROM " . DB_PREFIX . "job_application ja LEFT JOIN " . DB_PREFIX . "job j ON (j.job_id = ja.job_id) WHERE ja.job_application_id IN ('" . implode("','", array_map('intval', $job_application_ids)) . "') AND j.client_id = '" . (int)$this->client->getClientId() . "'");
}

Because the client_id is only referenced in the job table, I need to LEFT JOIN the job table every time I want to UPDATE or DELETE from the job_application table

Should I add another client_id field to the job_application table, essentially duplicating data already held in the database, or continue with the LEFT JOIN for every UPDATE and DELETE?


Solution

  • Your problem isn't that you need to denormalize "job_applications" by introducing the "client_id" as a redundant column. (The currently accepted answer is factually incorrect in that regard.) Your problem is that you didn't normalize correctly in the first place. If you had, the column "client_id" would already be in that table, and your problem would never have arisen in the first place.

    Let's pretend that candidate names, client names, and job names are globally unique.

    A table that looks like this will satisfy the predicate Person named "candidate_name" applies for "job_name" at company "client_name".

    job_applicatons
    Person named <candidate_name> applies for <job_name> at company <client_name>.
    
    client_name  job_name                candidate_name  
    --
    Microsoft    C++ programmer, Excel   Ed Wood 
    Microsoft    C++ programmer, Excel   Dane Crute 
    Microsoft    C++ programmer, Excel   Vim Winder
    Microsoft    C++ programmer, Word    Wil Krug
    Microsoft    C++ programmer, Word    Val Stein
    Google       Python coder, search    Ed Wood
    Google       Programmer, compilers   Ed Wood
    Google       Programmer, compilers   Val Stein
    

    Three columns, no id numbers, no nulls, no nonprime attributes, all key. This relation is in 6NF.

    It should be obvious that you could create a table for jobs (or job offers) by selecting distinct values from the first two columns. The foreign key reference is obvious.

    jobs
    Company named <client_name> offers <job_name>.
    
    client_name  job_name
    --
    Microsoft    C++ programmer, Excel
    Microsoft    C++ programmer, Word
    Google       Python coder, search
    Google       Programmer, compilers
    

    In a similar way, you can select distinct values from the first column alone for a set of companies, and from the last column alone for a set of applicants. Again, the foreign key references should be obvious.

    clients
    Company named <client_name> is a client.
    
    client_name
    --
    Microsoft
    Google
    
    candidates
    Person named <candidate_name> is looking for a job.
    
    candidate_name  
    --
    Ed Wood 
    Dane Crute 
    Vim Winder
    Wil Krug
    Val Stein
    

    All those tables are in 6NF.

    Augmenting a table with a surrogate key in addition to its natural keys doesn't change the normal form when you do it correctly. Let's replace the natural keys in "job_applications" with your surrogate ID numbers. Making that replacement will result in your table looking like this. (In practice, you'd do the same thing in the other tables, too.)

    job_applications
    --
    client_id
    job_id
    candidate_id
    primary key (client_id, job_id, candidate_id)
    other columns go here...
    

    Note that client_id is already in there. If there are no other columns, you're still in at least 5NF.