Currently learning data science with SQLite and Pandas. Working on a political contributions dataset. I'm wondering what is the purpose of WHERE contributors.candidate_id = candidates.id
in the code below when joining two tables. Given my aim is to extract contributors for a particular candidate and include their name from another table, would the output not be the same without this condition?
implicit_join_sel = """
SELECT
contributors.last_name, contributors.first_name, contributors.amount, candidates.last_name
FROM
contributors, candidates
WHERE
contributors.candidate_id = candidates.id
AND candidates.last_name = 'Obama';
"""
db.cursor().execute(implicit_join_sel)
Output:
contributors.last_name contributors.first_name contributors.amount candidates.last_name
0 Buckler Steve 50.0 Obama
1 Buckler Steve 25.0 Obama
2 Buckheit Bruce 100.0 Obama
3 Buckel Linda 2300.0 Obama
4 Buckel Linda -2300.0 Obama
FYI summary of tables:
--Candidates--
id first_name last_name middle_name party
0 33 Joseph Biden NaN D
1 36 Samuel Brownback NaN R
--Contributors--
last_name first_name middle_name street_1 street_2 city state zip amount date candidate_id
0 Agee Steven NaN 549 Laurel Branch Road NaN Floyd VA 24091 500.0 2007-06-30 16
4 Akin Charles NaN 10187 Sugar Creek Road NaN Bentonville AR 72712 100.0 2007-06-16 16
The sytax of the FROM
clause, alone, would cause every row of the contributors
table to be joined to every row of the candidates
table. The WHERE
clause limits the join to only those pairs of rows with the same candidate ID.
The form of your FROM
and WHERE
clauses is not as clear as more modern SQL syntax, which would express this as
FROM
contributors
inner join candidates on contributors.candidate_id = candidates.id