sqlfluff

"sqlfluff" SQL linter: how does it decide to "uppercase" or to "lowercase" identifiers?


I'm doing baby steps with the sqlfluff SQL linter version 3.0.3 (version can be obtained with sqlfluff version) on the command line (there is also a online application).

So I'm finding this:

An attempt that results in upper-casing of identifiers

SELECT COUNT(DISTINCT A1.user_id) AS logged_user_empty_cart
FROM Activity AS A1, Activity AS A2
WHERE
    A1.user_id = A2.user_id
    AND A1.activity_type = 'login'
    AND A2.activity_type = 'add_to_cart'

Then run:

$ sqlfluff fix --dialect sqlite Cart.uc.sql

We get messages like:

L:   1 | P:  26 | CP02 | Unquoted identifiers must be consistently upper case.
                       | [capitalisation.identifiers]

And the result is massively 60s IBM kind of ugly:

SELECT COUNT(DISTINCT A1.USER_ID) AS LOGGED_USER_EMPTY_CART
FROM ACTIVITY AS A1, ACTIVITY AS A2
WHERE
    A1.USER_ID = A2.USER_ID
    AND A1.ACTIVITY_TYPE = 'login'
    AND A2.ACTIVITY_TYPE = 'add_to_cart'

An attempt that results in lower-casing of identifiers

On the other hand, take this text, which differs from the above in the lowercasing of the aliases only:

SELECT COUNT(DISTINCT a1.user_id) AS logged_user_empty_cart
FROM ACTIVITY AS a1, ACTIVITY AS a2
WHERE
    a1.user_id = a2.user_id
    AND a1.activity_type = 'login'
    AND a2.activity_type = 'add_to_cart'

Again:

$ sqlfluff fix --dialect sqlite Cart.lc.sql 

We get messages like:

L:   2 | P:   6 | CP02 | Unquoted identifiers must be consistently lower case.
                       | [capitalisation.identifiers]

And the result is much nicer (at least to my eyes), it's just the aliases that have been lowercased:

SELECT COUNT(DISTINCT a1.user_id) AS logged_user_empty_cart
FROM activity AS a1, activity AS a2
WHERE
    a1.user_id = a2.user_id
    AND a1.activity_type = 'login'
    AND a2.activity_type = 'add_to_cart'

It seems that sqlfluff arbitrarily decides to uppercase and lowercase identifiers based on what it encounters in the to-be-linted text. Is there some way to control that behaviour?


Solution

  • Rule CP02 ensures that identifiers in a query have always the same casing.

    When fixing, if the first identifier encountered is in lower case (resp. upper case), then all the following identifiers will be lower cased (resp. upper cased).

    In the samples, A1 is the first identifier encountered. Once in uppercase, the other in lowercase. Hence the fixes.