sqljoinleft-joinkagglefull-join

should these expressions yield the same result? u.creation_date < '2019-02-01' vs. u.creation_date <= '2019-01-31'


I am doing SQL exercise at Kaggle (Exercise: JOINs and UNIONs, 3) Initial questions and answers, Part 2) working with stackoverflow database.

For some reason, the query with u.creation_date <= '2019-01-31' does not seem give the same result as u.creation_date < '2019-02-01'. Which is confusing to me. Is there a rule when to use <= or < with the later date?

Here's the code for context

three_tables_query = """
                     SELECT u.id AS id,
                         MIN(q.creation_date) AS q_creation_date,
                         MIN(a.creation_date) AS a_creation_date
                     FROM `bigquery-public-data.stackoverflow.users` AS u
                         LEFT JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
                             ON u.id = a.owner_user_id
                         LEFT JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q
                             ON u.id = q.owner_user_id
                     WHERE u.creation_date >= '2019-01-01' and u.creation_date < '2019-02-01'
                     GROUP BY id
                    """

vs.

three_tables_query = """
                     SELECT u.id AS id,
                         MIN(q.creation_date) AS q_creation_date,
                         MIN(a.creation_date) AS a_creation_date
                     FROM `bigquery-public-data.stackoverflow.users` AS u
                         LEFT JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
                             ON u.id = a.owner_user_id
                         LEFT JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q
                             ON u.id = q.owner_user_id
                     WHERE u.creation_date >= '2019-01-01' and u.creation_date <= '2019-01-31'
                     GROUP BY id
                    """

additionally, I figured using full joins instead of left join also works out. Is there a recommendation to use Left Join rather than Full Join? (compilation time or for whatever reason)

three_tables_query = """
            select u.id as id, min(q.creation_date) as q_creation_date, min(a.creation_date) as a_creation_date
            from `bigquery-public-data.stackoverflow.users` u
            full join `bigquery-public-data.stackoverflow.posts_questions` q
            on u.id = q.owner_user_id
            full join `bigquery-public-data.stackoverflow.posts_answers` a
            on u.id = a.last_editor_user_id
            where 
            u.creation_date >= '2019-01-01' and u.creation_date < '2019-02-01'
            group by u.id
              """

Solution

  • These two comparisons are equivalent

    u.creation_date < '2019-02-01'
    u.creation_date <= '2019-01-31'
    

    provided the column name tells the truth and the column really contains a date. If this is a misnomer and the column actually contains date and time, then the first expression will still get all rows for January 31, while the second will miss all rows of January 31 past midnight (i.e. only include the very first second of the day), as DATE '2019-01-31' is considered equivalent to TIMESTAMP '2019-01-31 00:00:00'.

    As to left outer joins and full outer joins: this is semantic. Use the appropriate join according to what data you want:

    -- only give me matches:
    -- example: all products with the customers who bought them
    from a [inner] join b on ... 
    
    -- give me all matches
    -- plus all A rows hat don't have a match in B
    -- example: all products with the customers who bought them
    --          plus all products that have not been bought yet
    from a left [outer] join b on ... 
    
    -- give me all matches
    -- plus all A rows hat don't have a match in B
    -- plus all B rows hat don't have a match in A
    -- example: all products with the customers who bought them
    --          plus all products that have not been bought yet
    --          plus all customers who have not bought anything yet
    from a full [outer] join b on ... 
    

    The first two are very common, while full outer joins are rather rare.