ms-accessoledbjet

Why does DISTINCT change case of literal string


I have following simplified query against an Access Database:

SELECT
  id, 
  'value',
  'Value'
FROM 
  test

yields

id Expr1001 Expr1002
4711 value Value

Now the same query with DISTINCT changes the case of the second literal string:

SELECT DISTINCT
  id, 
  'value',
  'Value'
FROM 
  test

yields

id Expr1001 Expr1002
4711 value value

The Outcome is the same no matter if the query comes via OleDb or directly via MS Access. How come that the database engine changes a literal value? I understand that jet sql ist pretty strange sometimes - but this looks to me like a bug. Or is this behavior described somewhere?


Solution

  • This is mostly speculation, but what I think is happening is the following:

    SELECT DISTINCT x is equal to SELECT x GROUP BY x and converted to it.

    Columns in GROUP BY are de-duplicated, 'value' is equal to 'Value' according to the collation rules, thus only the first occurrence is processed.

    For each column, it's either a constant (calculated once), an aggregate column (calculated for each group), or a group column. Since 'value' is in the GROUP BY and 'Value' is equal to 'value', these are both mapped to the group 'value'

    We can see this behaviour too when grouping by individual columns:

    SELECT
      id, 
      'value',
      'Value'
    FROM 
      test
    GROUP BY
      id
    

    No 'value' in the GROUP BY clause, parsed as expressions

    id Expr1001 Expr1002
    4711 value Value
    SELECT
      id, 
      'value',
      'Value'
    FROM 
      test
    GROUP BY
      id, 'valuE'
    

    'value' and 'valuE' are equal, thus Expr1001 is a group column 'valuE, same goes for 'Value'

    id Expr1001 Expr1002
    4711 valuE valuE

    The string from the GROUP BY clause is used, even though neither strings use that capitalization.

    As far as I know, this behaviour is not documented, however. What is documented is that capitalization is ignored when evaluating groups, e.g. you can do SELECT ID, MAX(SomeColumn) FROM MyTable GROUP BY id and Access doesn't complain that there's no ID column in the GROUP BY clause, but it's weird this applies to constants and DISTINCT as well. The obvious workaround is: don't group by constants, use GROUP BY instead of DISTINCT when you have a SELECT clause with constants.

    If we compare this behaviour to T-SQL, it doesn't allow constants in the GROUP BY part, and thus doesn't experience this behavior. GROUP BY id, 'valuE' is not valid T-SQL.