I'm wondering if it's possible to create a custom SQL group by function.
For example, I often need to find the most common value within a group. While I can achieve this using CTEs or subqueries, having a custom group by function like the example below would be much cleaner:
SELECT company_name, most_common(first_name) AS most_common_first_name
FROM company
GROUP BY company_name
This feels like something that should be possible, however, I haven't found many promising results online. If this is something that could be done, does anyone have any site refers or examples?
Using Squirrel's link, I was able to create a custom aggregate function. In terms of performance, it runs slightly slower than an equivalent CTE query; from a couple of tests about 5x slower. However this could likely could be improved with further code optimization. That said, using a custom aggregate is much cleaner and more readable SQL code, which can be worth the trade-off depending on your use case.
For anyone interested in implementing this themselves, here’s an example.
First, create a Class Library in C# or VB.NET (I used VB.NET):
Imports Microsoft.SqlServer.Server
Imports System.Data.SqlTypes
Imports System.IO
<Serializable(), SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToNulls:=True, IsInvariantToDuplicates:=False, IsInvariantToOrder:=False, MaxByteSize:=-1)>
Public Class Majority
Implements IBinarySerialize
Private ValueCount As Dictionary(Of String, Integer)
Public Sub Init()
ValueCount = New Dictionary(Of String, Integer)(StringComparer.OrdinalIgnoreCase)
End Sub
Public Sub Accumulate(ByVal value As SqlString)
If value.IsNull Then Return
Dim key As String = value.Value
If Not ValueCount.ContainsKey(key) Then Me.ValueCount(key) = 0
Me.ValueCount(key) += 1
End Sub
Public Sub Merge(ByVal other As Majority)
For Each keyValuePair In other.ValueCount
If Me.ValueCount.ContainsKey(keyValuePair.Key) Then
Me.ValueCount(keyValuePair.Key) += keyValuePair.Value
Else
Me.ValueCount(keyValuePair.Key) = keyValuePair.Value
End If
Next
End Sub
Public Function Terminate() As SqlString
If ValueCount Is Nothing OrElse ValueCount.Count = 0 Then Return SqlString.Null
Dim mostCommonValue As String = Nothing
Dim highestCount As Integer = -1
For Each keyValuePair In Me.ValueCount
If keyValuePair.Value > highestCount OrElse (keyValuePair.Value = highestCount And String.Compare(keyValuePair.Key, mostCommonValue, StringComparison.OrdinalIgnoreCase) < 0) Then
highestCount = keyValuePair.Value
mostCommonValue = keyValuePair.Key
End If
Next
Return New SqlString(mostCommonValue)
End Function
Public Sub Read(ByVal Reader As BinaryReader) Implements IBinarySerialize.Read
Dim count As Integer = Reader.ReadInt32()
Me.ValueCount = New Dictionary(Of String, Integer)(count, StringComparer.OrdinalIgnoreCase)
For index As Integer = 0 To count - 1
Dim key As String = Reader.ReadString()
Dim value As Integer = Reader.ReadInt32()
Me.ValueCount(key) = value
Next
End Sub
Public Sub Write(ByVal Writer As BinaryWriter) Implements IBinarySerialize.Write
Writer.Write(Me.ValueCount.Count)
For Each keyValuePair In Me.ValueCount
Writer.Write(keyValuePair.Key)
Writer.Write(keyValuePair.Value)
Next
End Sub
End Class
Then, create the aggregate in your target database:
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'clr enabled', 1;
RECONFIGURE;
DECLARE @dll_hash VARBINARY(8000) = (SELECT HASHBYTES('SHA2_512', BulkColumn) FROM OPENROWSET(BULK 'C:\custom_sql_aggregate.dll', SINGLE_BLOB) AS dll);
EXEC sp_add_trusted_assembly
@hash = @dll_hash,
@description = N'custom_sql_aggregate Custom User Aggregate';
CREATE ASSEMBLY custom_sql_aggregate
FROM 'C:\custom_sql_aggregate.dll';
CREATE AGGREGATE majority(@input NVARCHAR (200))
RETURNS NVARCHAR (MAX)
EXTERNAL NAME custom_sql_aggregate.[custom_sql_aggregate.Majority];
Finally, use the aggregate in your queries like so:
SELECT city, dbo.majority(first_name) AS 'most_common_first_name'
FROM company
GROUP BY city