<Python + Patsy> Why Name of dummy variables with/without T?

Using patsy, I noticed that it named dummy variables sometimes with T and without T in other cases. And today I realised that T is attached when the constant term is present in a regression equation, and no T without the constant term. For example, compare z[T.1], z[0], z[1], indicated by OUTPUT in the following code.

import pandas as pd
import patsy

data = {'z': ['1', '0', '0'],
        'y': [150, 200, 50],
        'x': [200, 210, 90]}
df = pd.DataFrame(data)

# with constant -----------------------
form_const = 'y ~ x + z'
y_const, X_const = patsy.dmatrices(form_const, df, return_type='dataframe')
print(X_const.columns.tolist())

# ['Intercept', 'z[T.1]', 'x'] <- OUTPUT

# withOUT constant --------------------
form_no_const = 'y ~ -1 + x + z'
y_no_const, X_no_const = patsy.dmatrices(form_no_const, df, return_type='dataframe')
print(X_no_const.columns.tolist())

# ['z[0]', 'z[1]', 'x'] <- OUTPUT

Questions

What is the role of T? Does it just indicate the presence of the constant term? If so, isn't it redundant, given that we can always see the presence/absence of the constant term? Are there any other roles?

Your insight is appreciated in advance.

Solution

There are lots of different ways to code categorical variables in a regression. They produce the same predictions, but the actual beta coefficients are different, and if you want to interpret the betas or do hypothesis testing on them, you need to know which coding was used.

Patsy uses the names as a hint as to which coding system is in use. When there's a "T", that's "treatment coding", and the beta coefficients tell you how the response for the given category differs from some baseline category. When there's no "T", the beta coefficients aren't differences, they're just the prediction for that category.

The reason patsy sometimes uses one and sometimes uses the other is that patsy automatically tries to find a full-rank encoding, where the betas all have unique and intepretable values. (The other option is an "overdetermined" model, where there are infinitely many betas that give the same predictions and you need to add some extra arbitrary constraints to fit the model.) If you have an intercept term in your model, then that provides one degree of freedom to start with, and when patsy goes to add the categorical variable it detects that and uses a (n-1)-dimensional encoding, like treatment encoding, and you get the "T". If there isn't an intercept term, then it uses an n-dimensional encoding, and you don't get the "T".

Patsy also lets you choose different coding schemes, or even define your own: https://patsy.readthedocs.io/en/latest/API-reference.html#handling-categorical-data

For more information about coding schemes in patsy and in general, see:

https://www.statsmodels.org/dev/contrasts.html
https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/ (note that this uses somewhat different terminology for some of the coding schemes)