sqlsql-servergroup-by

Custom Aggregate Function (to use with GROUP BY)


I'm wondering if it's possible to create a custom SQL group by function.

For example, I often need to find the most common value within a group. While I can achieve this using CTEs or subqueries, having a custom group by function like the example below would be much cleaner:

SELECT company_name, most_common(first_name) AS most_common_first_name
FROM company
GROUP BY company_name

This feels like something that should be possible, however, I haven't found many promising results online. If this is something that could be done, does anyone have any site refers or examples?


Solution

  • Using Squirrel's link, I was able to create a custom aggregate function. In terms of performance, it runs slightly slower than an equivalent CTE query; from a couple of tests about 5x slower. However this could likely could be improved with further code optimization. That said, using a custom aggregate is much cleaner and more readable SQL code, which can be worth the trade-off depending on your use case.

    For anyone interested in implementing this themselves, here’s an example.

    First, create a Class Library in C# or VB.NET (I used VB.NET):

    Imports Microsoft.SqlServer.Server
    Imports System.Data.SqlTypes
    Imports System.IO
    
    <Serializable(), SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToNulls:=True, IsInvariantToDuplicates:=False, IsInvariantToOrder:=False, MaxByteSize:=-1)>
    Public Class Majority
    
        Implements IBinarySerialize
    
        Private ValueCount As Dictionary(Of String, Integer)
    
        Public Sub Init()
    
            ValueCount = New Dictionary(Of String, Integer)(StringComparer.OrdinalIgnoreCase)
    
        End Sub
    
        Public Sub Accumulate(ByVal value As SqlString)
    
            If value.IsNull Then Return
    
            Dim key As String = value.Value
    
            If Not ValueCount.ContainsKey(key) Then Me.ValueCount(key) = 0
    
            Me.ValueCount(key) += 1
    
        End Sub
    
        Public Sub Merge(ByVal other As Majority)
    
            For Each keyValuePair In other.ValueCount
    
                If Me.ValueCount.ContainsKey(keyValuePair.Key) Then
    
                    Me.ValueCount(keyValuePair.Key) += keyValuePair.Value
    
                Else
    
                    Me.ValueCount(keyValuePair.Key) = keyValuePair.Value
    
                End If
    
            Next
    
        End Sub
    
        Public Function Terminate() As SqlString
    
            If ValueCount Is Nothing OrElse ValueCount.Count = 0 Then Return SqlString.Null
    
            Dim mostCommonValue As String = Nothing
            Dim highestCount As Integer = -1
    
            For Each keyValuePair In Me.ValueCount
    
                If keyValuePair.Value > highestCount OrElse (keyValuePair.Value = highestCount And String.Compare(keyValuePair.Key, mostCommonValue, StringComparison.OrdinalIgnoreCase) < 0) Then
    
                    highestCount = keyValuePair.Value
                    mostCommonValue = keyValuePair.Key
    
                End If
    
            Next
    
            Return New SqlString(mostCommonValue)
    
        End Function
    
        Public Sub Read(ByVal Reader As BinaryReader) Implements IBinarySerialize.Read
    
            Dim count As Integer = Reader.ReadInt32()
    
            Me.ValueCount = New Dictionary(Of String, Integer)(count, StringComparer.OrdinalIgnoreCase)
    
            For index As Integer = 0 To count - 1
    
                Dim key As String = Reader.ReadString()
                Dim value As Integer = Reader.ReadInt32()
                Me.ValueCount(key) = value
    
            Next
    
        End Sub
    
        Public Sub Write(ByVal Writer As BinaryWriter) Implements IBinarySerialize.Write
    
            Writer.Write(Me.ValueCount.Count)
    
            For Each keyValuePair In Me.ValueCount
    
                Writer.Write(keyValuePair.Key)
                Writer.Write(keyValuePair.Value)
    
            Next
    
        End Sub
    
    End Class
    

    Then, create the aggregate in your target database:

    EXEC sp_configure 'show advanced options', 1;
    RECONFIGURE;
    
    EXEC sp_configure 'clr enabled', 1;
    RECONFIGURE;
    
    DECLARE @dll_hash VARBINARY(8000) = (SELECT HASHBYTES('SHA2_512', BulkColumn) FROM OPENROWSET(BULK 'C:\custom_sql_aggregate.dll', SINGLE_BLOB) AS dll);
    
    EXEC sp_add_trusted_assembly 
    @hash = @dll_hash, 
    @description = N'custom_sql_aggregate Custom User Aggregate';
    
    CREATE ASSEMBLY custom_sql_aggregate 
    FROM 'C:\custom_sql_aggregate.dll';
    
    CREATE AGGREGATE majority(@input NVARCHAR (200))
    RETURNS NVARCHAR (MAX)
    EXTERNAL NAME custom_sql_aggregate.[custom_sql_aggregate.Majority];
    

    Finally, use the aggregate in your queries like so:

    SELECT city, dbo.majority(first_name) AS 'most_common_first_name'
    FROM company
    GROUP BY city