[SOLVED] Is it possible to rank the features based on their importance using autoencoder?

Is it possible to rank the features based on their importance using autoencoder?

I am using Autoencoder for the first time. I have come to know that it reduces the dimensionality of the input data set. I am not sure what does that actually mean. Does it select some specific features from the input features? Is it possible to rank the features using autoencoder?

My data looks like as below:

age   height        weight     working_hour     rest_hour   Diabetic
54    152            72           8                 4         0
62    159            76           7                 3         0
85    157            79           7                 4         1
24    153            75           8                 4         0
50    153            79           8                 4         1
81    154            80           7                 3         1

The features are age, height, wieght, working_hour and rest_hour. Target column is Diabetic. Here I have 5 features and I want to use less features. That is why I want to implement autoencoder to select the best features for the prediction.

Solution

Generally it is not possible with a vanilla autoencoder (AE). An AE performs a non-linear mapping to a hidden dimension and back to the original. However, you have no chance of interpreting this mapping. You could use contrained AEs, but i would not recommend it, when you work for the first time with AEs.

However, you just want a reduction of the input dimension. What you can do is to train an embedding. You train the AE with the desired number of nodes in the bottleneck and use the output of the encoder as input for your other algorithm.

You can split the AE into to two functions: encoder (E) and decoder (D). Your forward propagation is then D(E(x)), when x is your input. After you finished training the AE (with a reasonable reconstruction error!), you predict only E(x) and feed it tour your other algorithm.

Another way would be a PCA, which is basically a linear AE. You can define a maximum number of hidden dimensions and evaluate their stake on the reconstruction error. Furthermore, it is much easier to implement and you do not need knowledge of tensorflow or pytorch.