dimensionvlad-vector

A 8192-dimensional VLAD vector take 32KB of of memory per image. How?


I have a simple question concerning VLAD vector representation. How is it that an 8192-dimensional (k=64, 128-D SIFT) VLAD vector take '32KB of of memory' per image? I could not relate these two numbers.


Solution

  • As described in the VLFeat documentation, each element of the VLAD vector is given by

     v_k = sum_i q_ik (x_i - mu_i)

    where x_i is a descriptor vector (here: a 128-dimensional SIFT vector), and u_k is the center of the kth cluster - i.e. also a 128-dimensional SIFT vector. q_ik denotes the strength of association between x_i and u_i, which is 0 or 1 if K-means clustering is used. Thus, each v_k is 128-dimensional.

    The VLAD vector of an image I is then given by stacking all v_k:

    Vlad vector

    This vector has k elements, and each element is 128-dimensional. Thus, for k=64, we end up with 64 * 128 = 8192 numbers describing image I.

    Finally, if we use floating point numbers for each element, each number requires 4 bytes of memory. We thus end up with a total memory usage of 64 * 128 * 4 = 32768 Bytes or 32KB for the VLAD vector of each image.