I have following simplified query against an Access Database:
CREATE TABLE test (id TEXT)
INSERT INTO test VALUES ('4711')
SELECT
id,
'value',
'Value'
FROM
test
yields
id | Expr1001 | Expr1002 |
---|---|---|
4711 | value | Value |
Now the same query with DISTINCT
changes the case of the second literal string:
SELECT DISTINCT
id,
'value',
'Value'
FROM
test
yields
id | Expr1001 | Expr1002 |
---|---|---|
4711 | value | value |
The Outcome is the same no matter if the query comes via OleDb or directly via MS Access. How come that the database engine changes a literal value? I understand that jet sql ist pretty strange sometimes - but this looks to me like a bug. Or is this behavior described somewhere?
This is mostly speculation, but what I think is happening is the following:
SELECT DISTINCT x
is equal to SELECT x GROUP BY x
and converted to it.
Columns in GROUP BY
are de-duplicated, 'value'
is equal to 'Value'
according to the collation rules, thus only the first occurrence is processed.
For each column, it's either a constant (calculated once), an aggregate column (calculated for each group), or a group column. Since 'value'
is in the GROUP BY
and 'Value'
is equal to 'value'
, these are both mapped to the group 'value'
We can see this behaviour too when grouping by individual columns:
SELECT
id,
'value',
'Value'
FROM
test
GROUP BY
id
No 'value' in the GROUP BY clause, parsed as expressions
id | Expr1001 | Expr1002 |
---|---|---|
4711 | value | Value |
SELECT
id,
'value',
'Value'
FROM
test
GROUP BY
id, 'valuE'
'value'
and 'valuE'
are equal, thus Expr1001 is a group column 'valuE
, same goes for 'Value'
id | Expr1001 | Expr1002 |
---|---|---|
4711 | valuE | valuE |
The string from the GROUP BY
clause is used, even though neither strings use that capitalization.
As far as I know, this behaviour is not documented, however. What is documented is that capitalization is ignored when evaluating groups, e.g. you can do SELECT ID, MAX(SomeColumn) FROM MyTable GROUP BY id
and Access doesn't complain that there's no ID
column in the GROUP BY
clause, but it's weird this applies to constants and DISTINCT
as well. The obvious workaround is: don't group by constants, use GROUP BY
instead of DISTINCT
when you have a SELECT
clause with constants.
If we compare this behaviour to T-SQL, it doesn't allow constants in the GROUP BY
part, and thus doesn't experience this behavior. GROUP BY id, 'valuE'
is not valid T-SQL.