I am looking for a way to extract data with a single query and answering to two fundamental questions:
Suppose the following tables:
CREATE TABLE Invoice (
Id VARCHAR(3) PRIMARY KEY,
Pax INT NOT NULL
)
CREATE TABLE InvoiceLine (
Id VARCHAR(3) PRIMARY KEY,
InvoiceId VARCHAR(3) NOT NULL,
Activity VARCHAR(80) NOT NULL,
Amount DECIMAL(19,4) NOT NULL,
CONSTRAINT FK_InvoiceLine_InvoiceId FOREIGN KEY (InvoiceId) REFERENCES Invoice (Id)
)
And the following datas:
Invoice Table:
Id | Pax |
---|---|
001 | 2 |
002 | 2 |
003 | 4 |
InvoiceLine Table:
Id | InvoiceId | Activity | Amount |
---|---|---|---|
001 | 001 | Ticketing | 40.00 |
002 | 001 | Reservation | 10.00 |
003 | 002 | Ticketing | 30.00 |
004 | 002 | Ticketing | 30.00 |
005 | 002 | Reservation | 8.00 |
006 | 002 | Insurance | 6.60 |
007 | 003 | Reservation | 120.00 |
008 | 003 | Reservation | 40.00 |
INSERT INTO Invoice VALUES ('001', 2);
INSERT INTO Invoice VALUES ('002', 2);
INSERT INTO Invoice VALUES ('003', 4);
INSERT INTO InvoiceLine VALUES ('001', '001', 'Ticketing', 40);
INSERT INTO InvoiceLine VALUES ('002', '001', 'Reservation', 10);
INSERT INTO InvoiceLine VALUES ('003', '002', 'Ticketing', 30);
INSERT INTO InvoiceLine VALUES ('004', '002', 'Ticketing', 30);
INSERT INTO InvoiceLine VALUES ('005', '002', 'Reservation', 8);
INSERT INTO InvoiceLine VALUES ('006', '002', 'Insurance', 6.6);
INSERT INTO InvoiceLine VALUES ('007', '003', 'Ticketing', 120);
INSERT INTO InvoiceLine VALUES ('008', '003', 'Ticketing', 40);
My goal is to have a query to retrieve (in the same query):
I have found a way to obtain the correct result, but it requires to group by i.Id, which increase the number of rows returned (in my case I have millions of rows).
SELECT
i.Id,
il.Activity,
SUM(il.Amount) AS Amount,
AVG(i.Pax) AS Pax,
AVG(CAST(i.Pax AS float)) / COUNT(*) OVER (PARTITION BY i.Id) AS GlobalPax
FROM
Invoice AS i
JOIN InvoiceLine AS il ON i.Id = il.InvoiceId
GROUP BY
i.Id,
il.Activity
Which gives me:
Id | Activity | Amount | Pax | GlobalPax |
---|---|---|---|---|
001 | Reservation | 10.0000 | 2 | 1 |
001 | Ticketing | 40.0000 | 2 | 1 |
002 | Insurance | 6.6000 | 2 | 0.666666666666667 |
002 | Reservation | 8.0000 | 2 | 0.666666666666667 |
002 | Ticketing | 60.0000 | 2 | 0.666666666666667 |
003 | Ticketing | 160.0000 | 4 | 4 |
The results is almost perfect, but I don't need the Id column (required for the PARTITION BY clause).
A perfect result could be:
Activity | Amount | Pax | GlobalPax |
---|---|---|---|
Insurance | 6.6000 | 2 | 0.666666666666667 |
Reservation | 18.0000 | 4 | 1.666666666666667 |
Ticketing | 260.0000 | 8 | 5.666666666666667 |
Thanks
You should apply next grouping by Activity
, if don't need the Id column.
See example
select Activity
,sum(Amount) as Amount
,sum(Pax) as Pax
,sum(GlobalPax) as GlobalPax
from( -- your query
SELECT
i.Id,
il.Activity,
SUM(il.Amount) AS Amount,
AVG(i.Pax) AS Pax,
AVG(CAST(i.Pax AS float)) / COUNT(*) OVER (PARTITION BY i.Id) AS GlobalPax
FROM
Invoice AS i
JOIN InvoiceLine AS il ON i.Id = il.InvoiceId
GROUP BY i.Id, il.Activity
)a
group by Activity
Activity | Amount | Pax | GlobalPax |
---|---|---|---|
Insurance | 6.6000 | 2 | 0.666666666666667 |
Reservation | 18.0000 | 4 | 1.66666666666667 |
Ticketing | 260.0000 | 8 | 5.66666666666667 |