mysql

MySQL SUM of one column, DISTINCT of ID column


I'm trying to create a summary report of our orders but having trouble extracting all my required data in a single query.

The data I'd like to extract:

Orders table (simplified for this example)

| orderId | deliveryTotal | total |
|---------|---------------|-------|
| 1       | 5             | 15    |
| 2       | 5             | 15    |
| 3       | 7.50          | 27.50 |

Order items table

| orderItemId | orderId | productId | salePrice | quantity |
|-------------|---------|-----------|-----------|----------|
| 1           | 1       | 1         | 10        | 1        |
| 2           | 2       | 1         | 10        | 1        |
| 3           | 3       | 1         | 10        | 1        |
| 4           | 3       | 2         | 10        | 1        |

My current query for extracting this data is

SELECT
    SUM(i.salePrice * i.quantity) as subtotal,
    SUM(DISTINCT o.deliveryTotal) as deliveryTotal,
    COUNT(DISTINCT o.orderId) as orders,
    SUM(i.quantity) as quantity
FROM orderItems i
INNER JOIN orders o ON o.orderId = i.orderId

Which results in a correct subtotal, order count and quantity sum. But delivery total is returned as 12.50 when I'm after 17.50. If I do SUM(o.deliveryTotal) it will return 25.

EDIT: Desired results

| subtotal | deliveryTotal | orders | quantity |
|----------|---------------|--------|----------|
| 40.00    | 17.50         | 3      | 4        |

Solution

  • Because of the join, the SUM(DISTINCT deliveryTotal) aggregate is being applied to a rowset including the values 5, 5, 7.5, 7.5 (distinct 5 + 7.5 = 12.5).

    The rows your SUM() acted on become more apparent if you simply do

    SELECT o.*
    FROM orderItems i
    INNER JOIN orders o ON o.orderId = i.orderId
    

    Instead you are asking for the SUM() of all the values in deliveryTotal, irrespective of their position in the join with orderItems. That means you need to apply the aggregate at a different level.

    Since you are not intending to add a GROUP BY later, the easiest way to do that is to use a subselect whose purpose is only to get the SUM() across the whole table.

    SELECT
        SUM(i.salePrice * i.quantity) as subtotal,
        -- deliveryTotal sum as a subselect
        (SELECT SUM(deliveryTotal) FROM orders) as deliveryTotal,
        COUNT(DISTINCT o.orderId) as orders,
        SUM(i.quantity) as quantity
    FROM orderItems i
    INNER JOIN orders o ON o.orderId = i.orderId
    

    Subselects are usually discouraged but there won't be a significant performance penalty for the subselect, none different from the alternative methods of using a join for it. The calculation has to be done on a separate aggregate from the existing join no matter what. Other methods would place a subquery CROSS JOIN in the FROM clause, which performs the same thing we placed here in the subselect. Performance would be the same.