From https://learn.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver16 it says "However, each table or view column in any nonaggregate expression in the list must be included in the GROUP BY list". Exactly as you'd expect (there seem to be a couple of exceptions, probably MSSQL specific, regarding string concatenation but they don't seem to be relevant here).
At work I intended to write this:
SELECT
max(STATS_DATE(object_id, stats_id)),
object_name(object_id)
FROM sys.stats
GROUP BY object_name(object_id)
Which works as expected. What I actually wrote by accident was this
SELECT
max(STATS_DATE(object_id, stats_id)),
object_name(object_id)
FROM sys.stats
GROUP BY object_id -- should have been object_name(object_id)
The SELECT contains object_name(object_id)
but the GROUP BY contains object_id
, which is definitely not in the SELECT. ISTM this should have been rejected, but it runs fine.
I don't get it. Is there some support for functional dependencies that I'm not aware of? Is there even a functional dependency between object_id
and object_name(object_id)
(I don't think so because object_name(object_id)
might not necessarily be unique, although my understanding of FDs is pretty poor).
What's going on?
Is there some support for functional dependencies that I'm not aware of?
Sort of, but no.
What you are allowed to do is use the GROUP BY
value in any way you like in the SELECT
, with any functions or calculations, as long as you are not using any columns or values that have not been grouped.
So the below query works, because object_id
has been grouped, and you are allowed to pass a grouping value to a function.
SELECT
max(STATS_DATE(object_id, stats_id)),
object_name(object_id)
-- whatever
FROM sys.stats
GROUP BY object_id
You can add two GROUP BY
values together, or pass them to a function, or cast them etc.
What you cannot do is do the calculation, then group that, and then use the original value from before the calculation. The original value is not present in the GROUP BY
so cannot be used, only the actual value that was used.
SELECT
max(STATS_DATE(object_id, stats_id)),
object_id
FROM sys.stats
GROUP BY object_name(object_id)
Two further notes:
object_name
, because
nvarchar
is less efficient.APPLY
to calculate a value per object.sys.tables
and dynamic management views such as sys.dm_db_stats_properties
.Instead, use the following query:
SELECT
MaxStatsDate,
t.name
FROM sys.tables t
CROSS APPLY (
SELECT MAX(p.last_updated) AS MaxDate
FROM sys.stats s
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) p
) s;