mysqlsqljoinkeymultiple-tables

MySQL JOIN 3 tables using multiple columns/keys


Complete newbie to mySQL. So any help will be appreciated.

I have 3 tables -- carts, users, actions.

carts:
+------------+-------------+-------+
| cartId     | session_id  | userId| 
+------------+-------------+-------+

users:
+----------+-------------+
| usedId   | email       |
+----------+-------------+

actions:
+-------------+------------------+---- ---------+
| session_id  | impressionAction | impressionId | 
+-------------+------------------+-----+--------+

In carts, there is one session_id per line.

In users, there is one userId per line.

In actions, there are multiple lines per session_id counting for all the actions for that session.

I would like to JOINthe three tables getting the output to be something like

+------+-------------+--------+------------------+--------------+-------+
userId | session_id  | cartId | impressionAction | impressionId | email |
+------+-------------+--------+------------------+--------------+-------+

Where there will be multiple lines per userId and session_id; essentially a flattened file. I think if we JOIN carts and users on userId resulting in say A and then JOIN A and actions' onsession_id`, we are home.

A sample expected output is:

+------------+-------------+--------+------------------+--------------+---------+
userId       | session_id  | cartId | impressionAction | impressionId | email   |
+------------+-------------+--------+------------------+--------------+---------+
| 1234       | abc3f45     | 0001   | LOGIN            | 2032         |ab@yc.com|
| 1234       | abc3f45     | 0001   | ADD              | 4372         |ab@yc.com|
| 1234       | abc3f45     | 0001   | ADD              | 4372         |ab@yc.com|
| 1234       | abc3f45     | 0001   | SENDMAIL         | ab@yc.com    |ab@yw.com| 
| 4567       | def4rg4     | 0002   | LOGIN            | 2032         |db@yw.com|
| 4567       | def4rg4     | 0002   | ADD              | 4372         |db@yw.com|
| 4567       | def4rg4     | 0002   | REMOVE           | 3210         |db@yw.com|
+------------+-------------+--------+------------------+--------------+---------+** 

I don't know how to JOIN 3 tables without one common key. I don't even know what type of join it is called.

Essentially, we are trying to join 3 tables with non-overlapping keys, gathering one common key through the first JOIN and then joining the intermediate with the third one. Is this called a CROSS JOIN? If no, is there a name?


Solution

  • Taken from your comment above

    A USER may select many products, add them to their CART; a single USER may have multiple CARTS and at the end of the event, they can EMAIL the cart to themselves; the ACTIONS of the user are stored in the actions table

    This is how I see the structure (having in mind your data)

    +---------------------+     +---------------------+     +---------------------+
    | users               |     | carts               |     | actions             |
    +---------------------+     +---------------------+     +---------------------+
    | user_id       [PK]  |--|  | cart_id       [PK]  |     | impression_id [PK]  |
    | email               |  |--| user_id       [FK]  |     | action_name         |
    |                     |     | product_id    [FK]  |  |--| session_id    [FK]* |
    +---------------------+     | session_id    [FK]* |--|  |                     |
                                |                     |     +---------------------+
                                +---------------------+    
    

    As you can see above, I'm joining first with carts and them with actions because only the table carts has both, user and session data.

    The [FK]* next to the session_id on carts and actions could seem as a foreign key but in this case it's not - 'cause there's no separate table for sessions where it would be placed as an PK (primary key)

    You asked about join - it is the same as inner join. INNER JOIN creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate.

    This is a possible content of the tables

    +------------------------+
    | users                  |
    +------------------------+
    | id   | email           |
    +------+-----------------+
    | 1    | first@mail.org  |
    | 2    | second@mail.org |
    | 3    | third@mail.org  |
    +------+-----------------+
    
    +------------------------------------------+
    | carts                                    |
    +------------------------------------------+
    | id   | user_id | product_id | session_id |
    +------+---------+------------+------------+
    | 1    | 1       | 5          | 1aaaa      |
    | 2    | 2       | 5          | 2ffff      |
    | 3    | 3       | 8          | 3ddddd     |
    | 4    | 1       | 5          | 1aaaaa     |
    | 5    | 3       | 9          | 3bbbbb     |
    | 6    | 1       | 6          | 1ccccc     |
    +------+---------+------------+------------+
    
    +-------------------------------+
    | actions                       |
    +-------------------------------+
    | id   | name      | session_id |
    +------+-----------+------------+
    |  1   | ADD       | 1aaaa      |
    |  2   | ADD       | 2ffff      |
    |  3   | SENDMAIL  | 3ddddd     |
    |  4   | ADD       | 3ddddd     |
    |  5   | SENDMAIL  | 2ffff      |
    |  6   | ADD       | 1aaaaa     |
    |  7   | REMOVE    | 3ddddd     |
    |  8   | ADD       | 1ccccc     |
    |  9   | ADD       | 3bbbbb     |
    | 10   | SENDMAIL  | 3bbbbb     |
    +------+-----------+------------+
    

    As you can see, there are six products in the table carts and exactly six add actions in the table actions. Furthermore, as you can see user with an id=1 bought three products but not at the same time, since there are two sessions; user with an id=3 as well, bought these two products in different times etc...

    The SQL statement

    SELECT u.user_id, c.session_id, c.cart_id, a.impression_id, a.action_name, u.email
    FROM users AS u
    INNER JOIN carts AS c ON c.user_id = u.user_id
    INNER JOIN actions AS a ON a.session_id = c.session_id
    ORDER BY u.user_id, c.session_id, c.cart_id
    

    Results:

    +---------+------------+---------+---------------+-------------+-----------------+
    | user_id | session_id | cart_id | impression_id | action_name | email           |
    +---------+------------+---------+---------------+-------------+-----------------+
    | 1       | 1aaaa      | 1       | 1             | ADD         | first@mail.org  |
    | 1       | 1aaaa      | 1       | 6             | ADD         | first@mail.org  |
    | 1       | 1aaaa      | 4       | 1             | ADD         | first@mail.org  |
    | 1       | 1aaaa      | 4       | 6             | ADD         | first@mail.org  |
    | 1       | 1cccc      | 6       | 8             | ADD         | first@mail.org  |
    | 2       | 2ffff      | 2       | 5             | SENDMAIL    | second@mail.org |
    | 2       | 2ffff      | 2       | 2             | ADD         | second@mail.org |
    | 3       | 3bbbb      | 5       | 9             | ADD         | third@mail.org  |
    | 3       | 3bbbb      | 5       | 10            | SENDMAIL    | third@mail.org  |
    | 3       | 3dddd      | 3       | 3             | SENDMAIL    | third@mail.org  |
    | 3       | 3dddd      | 3       | 4             | ADD         | third@mail.org  |
    | 3       | 3dddd      | 3       | 7             | REMOVE      | third@mail.org  |
    +---------+------------+---------+---------------+-------------+-----------------+
    

    Note: There's no guarantee for session uniqueness.

    (Updated) Working SQL Fiddle


    UPDATE: (Finding and deleting duplicates)

    I've updated the SQL Fiddle in order to simulate duplicate records (when user added the same product within the same session). With the following statement you'll be able to retrieve those duplicated rows.

    SELECT c.card_id, c.user_id, c.product_id, c.session_id, a.action_name, a.impression_id
    FROM cards As c
    INNER JOIN actions AS a ON a.session_id = c.session_id
    GROUP BY c.user_id, c.product_id, c.session_id, a.action_name
    HAVING count(*) > 1
    

    Results:

    +---------+------------+------------+------------+-------------+-----------------+
    | card_id | user_id    | product_id | session_id | action_name | impression_id   |
    +---------+------------+------------+------------+-------------+-----------------+
    | 1       | 1          | 5          | 1aaaa      | ADD         | 1               |
    | 6       | 1          | 6          | 1cccc      | ADD         | 8               |
    +---------+------------+------------+------------+-------------+-----------------+
    

    In the SELECT part of the statement above you may omit everything except card_id and impression_id. Deleting these two duplicates in one statement is a bit tricky since you can't modify the same table selected in a sub-query within the same query. I would avoid the tricky part in this case (which involves another inner sub-query) and would delete duplicates using separate statements as following

    -- delete duplicates from cards
    --
    DELETE FROM WHERE card_id IN (1,6)
    
    -- delete duplicates from actions
    --
    DELETE FROM WHERE card_id IN (1,8)
    

    Even better, you could check if the user already has been added a selected product and don't add it twice.