I'm trying to pull a query from aMember. We have a few tables:
am_user - Contains user data - PK = user_id
am_invoice - Contains one line per invoice - PK invoice_id, FK to user_id
am_invoice_item - Contains one line per product - FK to invoice_id
We currently sell 2 different products using 4 different product_ids (it's software, and the products are translated in 2 different versions). Users can buy only episode 1, only episode 2, or both. (We're not interested in those who did not make a purchase yet).
My query looks like the following:
SELECT
u.user_id, login, u.added
, CASE ii.item_id
WHEN 3 THEN i.tm_added
WHEN 5 THEN i.tm_added
ELSE NULL END AS Episode1Bought
, CASE ii2.item_id
WHEN 7 THEN i2.tm_added
WHEN 6 THEN i2.tm_added
ELSE NULL END AS Episode2Bought
FROM
am_user u
LEFT JOIN am_invoice i ON u.user_id = i.user_id
LEFT JOIN am_invoice_item ii ON i.invoice_id = ii.invoice_id AND ii.item_id IN (3, 5)
LEFT JOIN am_invoice i2 ON u.user_id = i2.user_id
LEFT JOIN am_invoice_item ii2 ON i2.invoice_id = ii2.invoice_id AND ii2.item_id IN (7, 6)
WHERE i.status = 1 AND i2.status = 1 -- Only paid invoices
ORDER BY user_id
However, this ends up returning up to 4 rows for customers who bought both episodes. For others, who only bought 1 episode, it works correctly:
1 user1 2013-12-07 18:06:01 Episode1_EN 2014-01-11 13:28:19 Episode2_DK 2014-02-15 10:22:30
1 user1 2013-12-07 18:06:01 NULL NULL Episode2_DK 2014-02-15 10:22:30
1 user1 2013-12-07 18:06:01 Episode1_EN 2014-01-11 13:28:19 NULL NULL
1 user1 2013-12-07 18:06:01 NULL NULL NULL NULL
I'd like for it to only return the row containing the details for both the purchases:
1 user1 2013-12-07 18:06:01 Episode1_EN 2014-01-11 13:28:19 Episode2_DK 2014-02-15 10:22:30
I bet I'm overlooking something pretty simple, but for the life of me cannot figure out where I went wrong. Any ideas how I can get rid of the redundant rows?
I managed to solve it by using two separate queries; one for product1
, and another for product2
. In the product1
query, I add a NUL
L column with an alias of Episode2Bought
. In the product2
query, I do the same, but for Product1Bought
. This way I can UNION
thw two resultsets.
However, this still lead to duplicate rows in case a user purchased both products. To solve that, I wrap a SELECT around that resultset, use MAX()
functions on the ProductBought columns to remove the NULLS and use a GROUP BY to merge the duplicate rows together.
The end query looks like this:
SELECT
user_id
, login
, added
, MAX(Episode1Bought) AS Episode1Bought
, MAX(Episode2Bought) AS Episode2Bought
FROM
(
SELECT
u.user_id, login, u.added
, CASE ii.item_id
WHEN 3 THEN i.tm_added
WHEN 5 THEN i.tm_added
ELSE NULL END AS Episode1Bought
, NULL AS Episode2Bought
FROM
am_user u
JOIN am_invoice i ON u.user_id = i.user_id
JOIN am_invoice_item ii ON i.invoice_id = ii.invoice_id AND ((ii.item_id IN (3, 5)) )
WHERE i.status = 1
UNION
SELECT
u.user_id, login, u.added
, NULL AS Episode1Bought
, CASE ii.item_id
WHEN 6 THEN i.tm_added
WHEN 7 THEN i.tm_added
ELSE NULL END AS Episode2Bought
FROM
am_user u
JOIN am_invoice i ON u.user_id = i.user_id
JOIN am_invoice_item ii ON i.invoice_id = ii.invoice_id AND ((ii.item_id IN (6, 7)) )
WHERE i.status = 1
)
as a
GROUP BY user_id, login, added
ORDER BY user_id
;
This correctly returns the desired result:
1 user1 2013-12-07 18:06:01 Episode1_EN 2014-01-11 13:28:19 Episode2_DK 2014-02-15 10:22:30