Assume that in a machine learning problem, there are several categorical features in dataset.
One common way to handle categorical features is one-hot encoding. However, in this example, authors applied OrdinalEncoder on categorical features before model fitting and getting feature importances.
I would like to ask if sklearn algorithms, in general, treat OrdinalEncoded features as continuous or categorical features.
If sklearn models treat OrdinalEncoded features as continuous features, is it the correct way to handle categorical features?
At the end, OrdinalEncoded features are just numbers (float), so as CutePoison said, they are treated as continuous way.
OrdinalEncoded features is the correct way to work? It depends, you should ask yourself, the order of data is important?.
If its important, you can use OrdinalEncoder
. Typical example is rating of a movie: ["disgusting", "bad", "normal", "good", "super"]
. As you can see, bad is "smaller" than "normal", so there an order importance.
However, in other categorical data like professions: ["police", "teacher", "lawyer", "engineer"]
there is no order importance. You can't say that police is "smaller" than lawyer for example. Then, you have to use OneHotEncoder
.
So, as conclusion, it depends on how is your categorical data.