mysqlsqlvisualizationset-theory

sql joins as venn diagram


I've had trouble understanding joins in sql and came upon this image which I think might help me. The problem is that I don't fully understand it. For example, the join in the top right corner of the image, which colors the full B circle red and but only the overlap from A. The image makes it seem like circle B is the primary focus of the sql statement, but the sql statement itself, by starting with A (select from A, join B), conveys the opposite impression to me, namely that A would be the focus of the sql statement.

Similarly, the image below that only includes data from the B circle, so why is A included at all in the join statement?

Question: Working clockwise from the top right and finishing in the center, can someone provide more information about the representation of each sql image, explaining

a) why a join would be necessary in each case (for example, especially in situations where no data's taken from A or B i.e. where only A or B but not both is colored)

b) and any other detail that would clarify why the image is a good representation of the sql

sql join diagram


Solution

  • I think your main underlying confusion is that when (for example) only A is highlighted in red, you're taking that to mean "the query only returns data from A", but in fact it means "the query only returns data for those cases where A has a record". The query might still contain data from B. (For cases where B does not have a record, the query will substitute NULL.)

    Similarly, the image below that only includes data from the B circle, so why is A included at all in the join statement?

    If you mean — the image where A is entirely in white, and there's a red crescent-shape for the part of B that doesn't overlap with A, then: the reason that A appears in the query is, A is how it finds the records in B that need to be excluded. (If A didn't appear in the query, then Venn diagram wouldn't have A, it would only show B, and there'd be no way to distinguish the desired records from the unwanted ones.)

    The image makes it seem like circle B is the primary focus of the sql statement, but the sql statement itself, by starting with A (select from A, join B), conveys the opposite impression to me, namely that A would be the focus of the sql statement.

    Quite right. For this reason, RIGHT JOINs are relatively uncommon; although a query that uses a LEFT JOIN can nearly always be re-ordered to use a RIGHT JOIN instead (and vice versa), usually people will write their queries with LEFT JOIN and not with RIGHT JOIN.