So I have an interesting scenario. There are 2 events, for which one of the Married EventType I'd like to join associated married couples of the house, grouped by an identifier (HouseID; 2 people in the same house), the EventType, and the EventDate. For the case such as an EventType of Birthday, the 2 residents of the house would not be combined in the same row. In the case of an EventType of Wedding, combine the name result of 2 rows (2 rules below based on LastName) into 1. It's possible to have only 1 person for a HouseID for the EventType of Wedding or Birthday, so therefore they would list as an individual row. The combining rules for EventType of Wedding would be as follows:
The combining of the names would be contingent on the HouseID, EventType, and EventDate being the same, specific only to Wedding. This is because it's possible for 2 people to live in a house that are married, but not to each other which we base off the EventDate; we assume the EventDate is the indicator that they are married to each other. The example table input is as follows:
DECLARE @t TABLE (
HouseID INT,
FirstName NVARCHAR(64),
LastName NVARCHAR(64),
EventType NVARCHAR(64),
EventDate DATE
);
INSERT INTO @t (HouseID, FirstName, LastName, EventType, EventDate)
VALUES
(1, 'Will', 'Stanton', 'Birthday', '1974-01-05'),
(1, 'Mary', 'Stanton', 'Birthday', '1980-05-22'),
(2, 'Jason', 'Stockmore', 'Birthday', '1987-12-07'),
(3, 'Mark', 'Mellony', 'Wedding', '2021-04-04'),
(3, 'Stacy', 'Mellony', 'Wedding', '2021-04-04'),
(4, 'Stephen', 'Johnson', 'Wedding', '2012-01-30'),
(4, 'Janetsy', 'Johnson', 'Wedding', '2012-01-30'),
(5, 'George', 'Jackson', 'Wedding', '2009-11-15'),
(5, 'Sally', 'Mistmoore', 'Wedding', '2009-11-15'),
(6, 'Sandy', 'Katz', 'Wedding', '2010-03-19'),
(6, 'Jeff', 'Trilov', 'Wedding', '2016-09-09'),
(7, 'Sandra', 'Kirchbaum', 'Wedding', '2011-05-22'),
(8, 'Jessica', 'Bower', 'Birthday', '1996-02-26'),
(8, 'Frank', 'Fjorn', 'Birthday', '1969-07-19');
The ideal result based on the input table would resemble:
| HouseID | FinalName | EventType | EventDate |
| ------- | ---------------------------------- | --------- | ---------- |
| 1 | Mary Stanton | Birthday | 1974-01-05 |
| 1 | Will Stanton | Birthday | 1980-05-22 |
| 2 | Jason Stockmore | Birthday | 1987-12-07 |
| 3 | Mark and Stacy Mellony | Wedding | 2021-04-04 |
| 4 | Stephen and Janetsy Johnson | Wedding | 2012-01-30 |
| 5 | George Jackson and Sally Mistmoore | Wedding | 2009-11-15 |
| 6 | Sandy Katz | Wedding | 2010-03-19 |
| 6 | Jeff Trilov | Wedding | 2016-09-09 |
| 7 | Sandra Kirchbaum | Wedding | 2011-05-22 |
| 8 | Jessica Bower | Birthday | 1996-02-26 |
| 8 | Frank Fjorn | Birthday | 1969-07-19 |
I have tried a couple of approaches; one of which is using an update statement to update the First Name and update a subsequent FirstName based on Row number using a previously set @Values variable, unioning the result of the types that do not combine (in this case, Birthday). Here is where I built the names then used the MAX() aggregation to select the larger result:
SELECT HouseID,
FirstName
, LastName
, EventType
, EventDate
, RowNum = ROW_NUMBER() OVER (PARTITION BY LastName, EventType ORDER BY 1/0)
, Values1 = CAST(NULL AS VARCHAR(MAX))
INTO #EntityValues1
FROM @t
WHERE EventType = 'Wedding'
UPDATE #EntityValues1
SET @Values1 = Values1 =
CASE WHEN RowNum = 1
THEN FirstName
ELSE @Values1 + ' and ' + FirstName
END
However this example only works with combining FirstName1 + FirstName2 + LastName (in a subsequent query with MAX(Values1) + ' ' + LastName. I had to do a subsequent query to take the approach where I am combining names that do not have the same last name. I know this particular query is a bit tricky, but I'm wondering if there's any magic I'm missing out on. I've seen some suggestions that use a FOR XML approach with STUFF involved, and some other suggestions, but this one appears to be a tough one.
Here is an answer, with a working demo
;WITH
[DoubleWeddings] AS (
SELECT
[HouseID]
FROM
@t
WHERE
[EventType] = 'Wedding'
GROUP BY
[HouseID],
[EventDate]
HAVING
COUNT(*) = 2
),
[DoubleWeddingsSameLastName] AS (
SELECT
T.[HouseID]
FROM
[DoubleWeddings] DW
JOIN
@t T
ON T.[HouseID] = DW.[HouseID]
GROUP BY
T.[HouseID],
T.[LastName]
HAVING
COUNT(*) = 2
),
[DoubleWeddingsDifferentLastName] AS (
SELECT [HouseID] FROM [DoubleWeddings]
EXCEPT
SELECT [HouseID] FROM [DoubleWeddingsSameLastName]
),
[Couples] AS (
SELECT
T.[HouseID],
ROW_NUMBER() OVER (PARTITION BY T.[HouseID] ORDER BY 1/0) [RN],
T.[FirstName],
T.[LastName],
T.[EventType],
T.[EventDate]
FROM
[DoubleWeddings] DW
JOIN
@t T
ON T.[HouseID] = DW.[HouseID]
)
SELECT
DWSL.[HouseID],
FORMATMESSAGE(
'%s and %s %s',
F.[FirstName],
S.[FirstName],
S.[LastName]) [FinalName],
F.[EventType],
F.[EventDate]
FROM
[DoubleWeddingsSameLastName] DWSL
JOIN
[Couples] F
ON F.[HouseID] = DWSL.[HouseID] AND F.[RN] = 1
JOIN
[Couples] S
ON S.[HouseID] = DWSL.[HouseID] AND S.[RN] = 2
UNION ALL
SELECT
DWDL.[HouseID],
FORMATMESSAGE(
'%s %s and %s %s',
F.[FirstName],
F.[LastName],
S.[FirstName],
S.[LastName]) [FinalName],
F.[EventType],
F.[EventDate]
FROM
[DoubleWeddingsDifferentLastName] DWDL
JOIN
[Couples] F
ON F.[HouseID] = DWDL.[HouseID] AND F.[RN] = 1
JOIN
[Couples] S
ON S.[HouseID] = DWDL.[HouseID] AND S.[RN] = 2
UNION ALL
SELECT
T.[HouseID],
FORMATMESSAGE(
'%s %s',
T.[FirstName],
T.[LastName]) [FinalName],
T.[EventType],
T.[EventDate]
FROM
@t T
LEFT JOIN
[DoubleWeddings] DW
ON DW.[HouseID] = T.[HouseID]
WHERE
DW.[HouseID] IS NULL
ORDER BY
[HouseID],
[EventDate],
[FinalName]
There is one issue, I've used the ORDER BY 1/0
technique to skip providing an order for the ROW_NUMBER()
which tends to assign the row numbers in the order of the underlying data. However, this is not guaranteed, and could very depending on the parallelization of the query.
It would be better if the order of the combination was provided by a column in the data, however, none is present in the example.