Reconstruction error per feature for autoencoders?

I'm using autoencoders for clustering, and I'd like to figure out feature importance by using reconstruction error per feature. Here's what I tried:

import keras.backend as K

def mse_per_feature(y_true, y_pred):
    mse = K.mean(K.square(y_true - y_pred), axis=0)
    return mse

model.compile(optimizer='adam', loss=mse_per_feature)

reconstructed_output = model.predict(x_test)

mse_per_feature = ((x_test - reconstructed_output)**2).mean(axis=0)
print(mse_per_feature)

But that didn't work as x_test and reconstructed_output have different dimensions. ChatGPT says I need a decoder model to make this work, but I have no idea what kind of decoder model to build (I am a beginner here). The encoder model is like this:

from tensorflow.keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, Dropout
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.initializers import glorot_uniform
from keras.optimizers import SGD

encoding_dim = 7

input_df = Input(shape=(17,))


# Glorot normal initializer (Xavier normal initializer) draws samples from a truncated normal distribution 

x = Dense(encoding_dim, activation='relu')(input_df)
x = Dense(500, activation='relu', kernel_initializer = 'glorot_uniform')(x)
x = Dense(500, activation='relu', kernel_initializer = 'glorot_uniform')(x)
x = Dense(2000, activation='relu', kernel_initializer = 'glorot_uniform')(x)

encoded = Dense(10, activation='relu', kernel_initializer = 'glorot_uniform')(x)

x = Dense(2000, activation='relu', kernel_initializer = 'glorot_uniform')(encoded)
x = Dense(500, activation='relu', kernel_initializer = 'glorot_uniform')(x)

decoded = Dense(17, kernel_initializer = 'glorot_uniform')(x)

# autoencoder
autoencoder = Model(input_df, decoded)

#encoder - used for our dimention reduction
encoder = Model(input_df, encoded)

autoencoder.compile(optimizer= 'adam', loss='mean_squared_error')

Could someone please help?

Solution

One of the most common uses of autoencoders is feature reduction. In this case, you should use an encoder and a decoder during the training phase. The encoder reduces the input dimension to a latent space (also called bottle-neck), and the decoder amplifies it again to the original dimension. By controlling the reconstruction error, you ensure that the encoder reduces the dimensions while giving importance to features that contain more informative content.

For feature reduction tasks, during the prediction phase, you will use only the already trained encoder to extract new features considered important from new data.

Usually, autoencoders are symmetric structures so you can reproduce a decoder equivalent to the encoder.

A great resource for learning autoencoder is Deep Learning book (Goodfellow).

Your code is not working because reconstructed_output = model.predict(x_test) actually is not the reconstructed output but the encoded output at the latent space level. The dimensions are different because the latent space dimension is equal to 10 and lower than the input dimension that is equal to 17.

using the second snip of code you provided you should calculate the error by doing:

from tensorflow.keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, Dropout
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.initializers import glorot_uniform
from keras.optimizers import SGD

input_df = Input(shape=(17,))


# Glorot normal initializer (Xavier normal initializer) draws samples from a truncated normal distribution 

x = Dense(encoding_dim, activation='relu')(input_df)
x = Dense(500, activation='relu', kernel_initializer = 'glorot_uniform')(x)
x = Dense(500, activation='relu', kernel_initializer = 'glorot_uniform')(x)
x = Dense(2000, activation='relu', kernel_initializer = 'glorot_uniform')(x)

encoded = Dense(10, activation='relu', kernel_initializer = 'glorot_uniform')(x)

x = Dense(2000, activation='relu', kernel_initializer = 'glorot_uniform')(encoded)
x = Dense(500, activation='relu', kernel_initializer = 'glorot_uniform')(x)

decoded = Dense(17, kernel_initializer = 'glorot_uniform')(x)

# autoencoder
autoencoder = Model(input_df, decoded)

#encoder - used for our dimention reduction
encoder = Model(input_df, encoded)

autoencoder.compile(optimizer= 'adam', loss='mean_squared_error')

reconstructed_output = autoencoder.predict(x_test)

mse_per_feature = ((x_test - reconstructed_output)**2).mean(axis=0)

print(mse_per_feature)

Note that at the beginning you impose encoding_dim = 7 but then you hardcode 10 as the dimension of the latent space.