mysqlamember

Querying aMember


I'm trying to pull a query from aMember. We have a few tables:

am_user - Contains user data - PK = user_id
am_invoice - Contains one line per invoice - PK invoice_id, FK to user_id
am_invoice_item - Contains one line per product - FK to invoice_id

We currently sell 2 different products using 4 different product_ids (it's software, and the products are translated in 2 different versions). Users can buy only episode 1, only episode 2, or both. (We're not interested in those who did not make a purchase yet).

My query looks like the following:

 SELECT 
     u.user_id, login, u.added
    , CASE ii.item_id 
        WHEN 3 THEN i.tm_added 
        WHEN 5 THEN i.tm_added 
        ELSE NULL END AS Episode1Bought
    , CASE ii2.item_id
        WHEN 7 THEN i2.tm_added
        WHEN 6 THEN i2.tm_added
        ELSE NULL END AS Episode2Bought
FROM 
    am_user u 
        LEFT JOIN am_invoice i ON u.user_id = i.user_id
        LEFT JOIN am_invoice_item ii ON i.invoice_id = ii.invoice_id AND ii.item_id IN (3, 5)
        LEFT JOIN am_invoice i2 ON u.user_id = i2.user_id 
        LEFT JOIN am_invoice_item ii2 ON i2.invoice_id = ii2.invoice_id AND ii2.item_id IN (7, 6)
WHERE i.status = 1 AND i2.status = 1 -- Only paid invoices
ORDER BY user_id

However, this ends up returning up to 4 rows for customers who bought both episodes. For others, who only bought 1 episode, it works correctly:

1   user1   2013-12-07 18:06:01   Episode1_EN   2014-01-11 13:28:19   Episode2_DK   2014-02-15 10:22:30
1   user1   2013-12-07 18:06:01   NULL          NULL                  Episode2_DK   2014-02-15 10:22:30
1   user1   2013-12-07 18:06:01   Episode1_EN   2014-01-11 13:28:19   NULL          NULL    
1   user1   2013-12-07 18:06:01   NULL          NULL                  NULL          NULL            

I'd like for it to only return the row containing the details for both the purchases:

1   user1   2013-12-07 18:06:01   Episode1_EN   2014-01-11 13:28:19   Episode2_DK   2014-02-15 10:22:30

I bet I'm overlooking something pretty simple, but for the life of me cannot figure out where I went wrong. Any ideas how I can get rid of the redundant rows?


Solution

  • I managed to solve it by using two separate queries; one for product1, and another for product2. In the product1 query, I add a NULL column with an alias of Episode2Bought. In the product2 query, I do the same, but for Product1Bought. This way I can UNION thw two resultsets.

    However, this still lead to duplicate rows in case a user purchased both products. To solve that, I wrap a SELECT around that resultset, use MAX() functions on the ProductBought columns to remove the NULLS and use a GROUP BY to merge the duplicate rows together.

    The end query looks like this:

    SELECT 
        user_id
        , login
        , added
        , MAX(Episode1Bought) AS Episode1Bought
        , MAX(Episode2Bought) AS Episode2Bought
    FROM
    (
    
    SELECT 
    u.user_id, login, u.added
        , CASE ii.item_id 
            WHEN 3 THEN i.tm_added 
            WHEN 5 THEN i.tm_added 
            ELSE NULL END AS Episode1Bought
        , NULL AS Episode2Bought
    
    FROM 
        am_user u 
            JOIN am_invoice i ON u.user_id = i.user_id
            JOIN am_invoice_item ii ON i.invoice_id = ii.invoice_id AND ((ii.item_id IN (3, 5)) )
        WHERE i.status = 1
    UNION  
    SELECT 
    u.user_id, login, u.added
        , NULL AS Episode1Bought
        , CASE ii.item_id 
            WHEN 6 THEN i.tm_added 
            WHEN 7 THEN i.tm_added 
            ELSE NULL END AS Episode2Bought
    FROM 
        am_user u 
            JOIN am_invoice i ON u.user_id = i.user_id
            JOIN am_invoice_item ii ON i.invoice_id = ii.invoice_id AND ((ii.item_id IN (6, 7)) )
        WHERE i.status = 1
    )
    as a 
    GROUP BY user_id, login, added
    ORDER BY user_id
    ;
    

    This correctly returns the desired result:

    1   user1   2013-12-07 18:06:01   Episode1_EN   2014-01-11 13:28:19   Episode2_DK   2014-02-15 10:22:30