machine-learningneural-networkenergyrbm

Measuring success of Restricted Boltzmann Machine


I am trying to implement my own RBM, but I am not sure, how to measure it's success 100% correctly. So I started googling and have found many interpretations and I am not sure what is correct.

I am facing this problem:

I have dataset Z, so I can divide it into training set X and testing set Y. I train RBM on X and then I would like to measure success of RBM on dataset Y. More precisely, let's say, I have two RBMs and I want to compare them somehow. I am not sure if reconstructing input vector is good measurement. Or If I should compare RBMs on their energy. (and how do I calculate energy on whole set Y correctly).

I would be also interested in gaussian-visible and all-gaussian units, if possible.


Solution

  • A RBM is an unsupervised learning paradigm, and therefore is difficult to access whether one is better than another.

    Nevertheless, they are usually used as a pre-training of recent and more exciting networks such as DBN. So my suggestion would be to train as much RBMs as you want to compare (unsupervised learning) and then give them to a feedforward layer for learning (supervised learning). From here you can now access how good your RBM is by measuring how good your network is to predict the class of your data.

    As an example, lets have 2 RBMs (A and B):

    As such, B is a better RBM than A, as it provided better features, leading to a better training and higher out-of-sample results. Note: as accuracy of networks vary, make sure you perform the supervised training several times and average them at the end so that your comparison is robust.

    EDIT:

    Regarding non-supervised evaluation the task is not as simple. As presented by Tijmen Tieleman in "Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient":

    One of the evaluations is how well the learned RBM models the test data, i.e. log likelihood. This is intractable for regular size RBMs, because the time complexity of that computation is exponential in the size of the smallest layer (visible or hidden)

    Yet, if you have small enough RBMs this is a possible approach. Otherwise, you can just wait...