pythonsqlsqlitepython-db-api

Correct use of keys when joining tables in SQL


Currently learning data science with SQLite and Pandas. Working on a political contributions dataset. I'm wondering what is the purpose of WHERE contributors.candidate_id = candidates.id in the code below when joining two tables. Given my aim is to extract contributors for a particular candidate and include their name from another table, would the output not be the same without this condition?

implicit_join_sel = """
SELECT
    contributors.last_name, contributors.first_name, contributors.amount, candidates.last_name
FROM
    contributors, candidates
WHERE
    contributors.candidate_id = candidates.id
AND candidates.last_name = 'Obama';
"""

db.cursor().execute(implicit_join_sel)

Output:

contributors.last_name  contributors.first_name contributors.amount candidates.last_name
0   Buckler Steve   50.0    Obama
1   Buckler Steve   25.0    Obama
2   Buckheit    Bruce   100.0   Obama
3   Buckel  Linda   2300.0  Obama
4   Buckel  Linda   -2300.0 Obama

FYI summary of tables:

--Candidates--
    id  first_name  last_name   middle_name party
0   33  Joseph  Biden   NaN D
1   36  Samuel  Brownback   NaN R

--Contributors--
    last_name   first_name  middle_name street_1    street_2    city    state   zip amount  date    candidate_id
0   Agee    Steven  NaN 549 Laurel Branch Road  NaN Floyd   VA  24091   500.0   2007-06-30  16
4   Akin    Charles NaN 10187 Sugar Creek Road  NaN Bentonville AR  72712   100.0   2007-06-16  16

Solution

  • The sytax of the FROM clause, alone, would cause every row of the contributors table to be joined to every row of the candidates table. The WHERE clause limits the join to only those pairs of rows with the same candidate ID.

    The form of your FROM and WHERE clauses is not as clear as more modern SQL syntax, which would express this as

    FROM
       contributors
       inner join candidates on contributors.candidate_id = candidates.id