sqlt-sqlsql-server-2017azure-sql-serverfor-json

How to aggregate and Join or Union into flat Json object with arrays?


I have data that looks like:

Customers Table

CustomerId  CustomerName CustomerEmail  
------------------------------------------
1           Ben          Ben@gmail.com
2           Robert       Robert@gmail.com
3           Paul         Paul@gmail.com

CustomerContacts Table

CustomerContactId  CustomerId  ContactName   ContactEmail
----------------------------------------------------------
99                 1           Lisa          Lisa@msn.com
98                 3           Jane          Jane@msn.com
97                 3           Wendy         Wendy@msn.com

Here's the result I'm looking for:

[
    {
        "CustomerId": 1,
        "Names": [ "Ben","Lisa" ],
        "Emails": [ "Ben@gmail.com","Lisa@msn.com" ]
    },
    {
        "CustomerId": 2,
        "Names": [ "Robert" ],
        "Emails": [ "Robert@gmail.com" ]
    },
    {
        "CustomerId": 3,
        "Names": [ "Paul","Jane","Wendy" ],
        "Emails": [ "Paul@gmail.com","Jane@msn.com","Wendy@msn.com" ]
    }
]

What I have tried: I'm embarrassed to say I'm not even close:

SELECT 
    Customers.CustomerId,
     STUFF( ISNULL(',' + Customers.CustomerName, '') + ISNULL(',' + CustomerContacts.ContactName, ''),1,1,'') as Names
FROM Customers
FULL JOIN CustomerContacts
ON Customers.CustomerId = CustomerContacts.CustomerId
GROUP BY Customers.CustomerId;

Solution

  • Unfortunately, SQL Server jumped on the JSON wagon a bit late in the game (started built in support only in 2016 version), which means it's JSON support is still not great (Though what it does know how to do it does great).
    Personally, I don't know of any built-in way to create a JSON array of values as a result of a query
    ({"Name":["Value1", "Value2"...]}) though it's quite easy to generate an array of key-value pairs
    (["Name":"Value1", "Name":"Value2"...]) - at least not by using the FOR JSON clause.

    However, Since you're working with 2017 and Azure versions, it is quite easy to generate such arrays yourself, using string_agg (It's a bit more cumbersome in earlier versions - using for xml path and stuff to aggregate strings).
    Note the usage of WITHIN GROUP (ORDER BY CustomerContactId), that forces a specific order to the aggregated string. Without it, the order of the names and emails of the contacts would be arbitrary (though I'm pretty sure they will probably still both be ordered the same way within one run of the query).

    That being said - here's my proposed solution:

    First, create and populate sample tables (Please save is this step in your future questions):

    CREATE TABLE Customers (
        [CustomerId] int, 
        [CustomerName] varchar(6), 
        [CustomerEmail] varchar(16)
    );
        
    INSERT INTO Customers ([CustomerId], [CustomerName], [CustomerEmail]) VALUES
    (1, 'Ben', 'Ben@gmail.com'),
    (2, 'Robert', 'Robert@gmail.com'),
    (3, 'Paul', 'Paul@gmail.com');
    
    CREATE TABLE CustomerContacts (
        [CustomerContactId] int, 
        [CustomerId] int, 
        [ContactName] varchar(5), 
        [ContactEmail] varchar(13)
    );
        
    INSERT INTO CustomerContacts ([CustomerContactId], [CustomerId], [ContactName], [ContactEmail]) VALUES
    (99, 1, 'Lisa', 'Lisa@msn.com'),
    (98, 3, 'Jane', 'Jane@msn.com'),
    (97, 3, 'Wendy', 'Wendy@msn.com');
    

    Then, use a query with FOR JSON PATH to get a json output.
    The trick here is to generate the inner arrays by concatenating the CustomerName / CustomerEmail with the result of a STRING_AGG subquery of the relevant column in the CustomerContacts table.
    Note the JSON_QUERY wrapper around these columns. They are needed to prevent SQL Server to escape the " chars in the json output - by telling it that the content is proper JSON.
    Also, note the usage of ISNULL to act as a LEFT JOIN - you'll get all the customers, even if they don't have a corresponding record in the CustomerContacts table.

    SELECT  C.CustomerId, 
            JSON_QUERY(
                '["' + C.CustomerName + ISNULL('","'+ 
                (
                    SELECT STRING_AGG(CC.ContactName, '","') WITHIN GROUP (ORDER BY CustomerContactId)
                    FROM CustomerContacts As CC
                    WHERE CC.CustomerId = C.CustomerId
                ), '') + '"]'
            ) As Names,
            JSON_QUERY(
                '["' + C.CustomerEmail + ISNULL('","'+ 
                (
                    SELECT STRING_AGG(CC.ContactEmail, '","') WITHIN GROUP (ORDER BY CustomerContactId)
                    FROM CustomerContacts As CC
                    WHERE CC.CustomerId = C.CustomerId
                ), '') + '"]'
            ) As Emails
    FROM Customers AS C
    FOR JSON PATH
    

    Result:

    [
        {
            "CustomerId": 1,
            "Names": ["Ben", "Lisa"],
            "Emails": ["Ben@gmail.com", "Lisa@msn.com"]
        }, {
            "CustomerId": 2,
            "Names": ["Robert"],
            "Emails": ["Robert@gmail.com"]
        }, {
            "CustomerId": 3,
            "Names": ["Paul", "Wendy", "Jane"],
            "Emails": ["Paul@gmail.com", "Wendy@msn.com", "Jane@msn.com"]
        }
    ]
    

    You can see a live demo on DB<>Fiddle
    (Unfortunately, json output isn't nicely indented but it's valid none the less)