I have a simple question concerning VLAD vector representation. How is it that an 8192-dimensional (k=64, 128-D SIFT) VLAD vector take '32KB of of memory' per image? I could not relate these two numbers.
As described in the VLFeat documentation, each element of the VLAD vector is given by
where x_i
is a descriptor vector (here: a 128-dimensional SIFT vector), and u_k
is the center of the k
th cluster - i.e. also a 128-dimensional SIFT vector. q_ik
denotes the strength of association between x_i
and u_i
, which is 0 or 1 if K-means clustering is used. Thus, each v_k
is 128-dimensional.
The VLAD vector of an image I
is then given by stacking all v_k
:
This vector has k
elements, and each element is 128-dimensional.
Thus, for k=64
, we end up with 64 * 128 = 8192
numbers describing image I
.
Finally, if we use floating point numbers for each element, each number requires 4 bytes of memory. We thus end up with a total memory usage of 64 * 128 * 4 = 32768
Bytes or 32KB for the VLAD vector of each image.