ColumnTransformer output columns order

I am experiencing an issue with the columns order after applying ColumnTransformer. If you run the following code:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder


df = pd.DataFrame({
    'FeatureA': [1.05, 0.5, 2.5],
    'FeatureB': [0, -5, -15],
    'CatFeatureA': ['feat1', 'feat2', 'feat3'],
    'CatFeatureB': ['cat1', 'cat2', 'cat3'],
    'FeatureC': [250, 125.5, 300]
})

transformer = ColumnTransformer(
    [("drop", "drop", ["FeatureC"]),
     ("ordinal", OrdinalEncoder(), ["CatFeatureA", "CatFeatureB"])],
    remainder="passthrough"
)

features = pd.DataFrame(columns=df.drop("FeatureC", axis=1).columns, index=df.index, data=transformer.fit_transform(df))

You will notice that the output is:

Out[70]: 
   FeatureA  FeatureB  CatFeatureA  CatFeatureB
0       0.0       0.0         1.05          0.0
1       1.0       1.0         0.50         -5.0
2       2.0       2.0         2.50        -15.0

Basically the values are not correctly aligned with the columns: the values under FeatureA and FeatureB are actually the values that should be under CatFeatureA and CatFeatureB, and viceversa.

How can I make sure that values are correctly aligned? It seems that the features encoded with OrdinalEncoder always go first, however I would like to have a more robust approach, as the transformer could be expanded in the future.

Solution

You can access the column names in the order of the output with:

transformer.get_feature_names_out()

array(['ordinal__CatFeatureA', 'ordinal__CatFeatureB',
       'remainder__FeatureA', 'remainder__FeatureB'], dtype=object)

You could thus use:

features = pd.DataFrame(data=transformer.fit_transform(df),
                        index=df.index,
                        columns=transformer.get_feature_names_out(),
                       )

Or, better with the set_output API to request a DataFrame as output:

transformer.set_output(transform='pandas')
features = transformer.fit_transform(df)

Output:

   ordinal__CatFeatureA  ordinal__CatFeatureB  remainder__FeatureA  remainder__FeatureB
0                   0.0                   0.0                 1.05                  0.0
1                   1.0                   1.0                 0.50                 -5.0
2                   2.0                   1.0                 2.50                -15.0

And if you don't want the leading substring:

features = features.rename(columns=lambda x: x.split('__', 1)[-1])

Output:

   CatFeatureA  CatFeatureB  FeatureA  FeatureB
0          0.0          0.0      1.05         0
1          1.0          1.0      0.50        -5
2          2.0          1.0      2.50       -15