matlabimage-processingcomputer-visionvlfeatvlad-vector

Extracting VLAD from SIFT Descriptors in VLFeat with Matlab


I have a folder of images. I want to compute VLAD features from each image.

I loop over each image, load it, and obtain the SIFT descriptors as follows:

repo = '/media/data/images/';
filelist = dir([repo '*.jpg']);
sift_descr = {}

for i = 1:size(filelist, 1)
    I = imread([repo filelist(i).name]) ;
    I = single(rgb2gray(I)) ;
    [f,d] = vl_sift(I) ;
    sift_descr{i} = d
end

However, VLAD requires the matrix of descriptors to be 2D. See here. What is the correct way to process my SIFT descriptors, before VLAD encoding? Thank you.


Solution

  • First, you need to obtain a dictionary of visual words, or to be more specific: cluster the SIFT features of all images using k-means clustering. In [1], a coarse clustering using e.g. 64, or 256 clusters is recommended.

    For that, we have to concatenate all descriptors into one matrix, which we can then pass to the vl_kmeans function. Further, we convert the descriptors from uint8 to single, as the vl_kmeans function requires the input to be either single or double.

    all_descr = single([sift_descr{:}]);
    centroids = vl_kmeans(all_descr, 64);
    

    Second, you have to create an assignment matrix, which has the dimensions NumberOfClusters-by-NumberOfDescriptors, which assigns each descriptor to a cluster. You have a lot of flexibility in creating this assignment matrix: you can do soft or hard assignments, you can use simple nearest neighbor search or kd-trees or other approximate or hierarchical nearest neighbor schemes at your discretion.

    In the tutorial, they use kd-trees, so let's stick to that: First, a kd-tree has to be built. This operation belongs right after finding the centroids:

    kdtree = vl_kdtreebuild(centroids);
    

    Then, we are ready to construct the VLAD vector for each image. Thus, we have to go through all images again, and calculate their VLAD vector independently. First, we create the assignment matrix exactly as described in the tutorial. Then, we can encode the SIFT descriptors using the vl_vlad function. The resulting VLAD vector will have the size NumberOfClusters * SiftDescriptorSize, i.e. 64*128 in our example..

    enc = zeros(64*128, numel(sift_descr));
    
    for k=1:numel(sift_descr)
    
        % Create assignment matrix
        nn = vl_kdtreequery(kdtree, centroids, single(sift_descr{k}));
        assignments = zeros(64, numel(nn), 'single');
        assignments(sub2ind(size(assignments)), nn, 1:numel(nn))) = 1;
    
        % Encode using VLAD
        enc(:, k) = vl_vlad(single(sift_descr{k}), centroids, assignments);
    end
    

    Finally, we have the high-dimensional VLAD vectors for all images in the database. Usually, you'll want to reduce the dimensionality of the VLAD descriptors e.g. using PCA.

    Now, given new image which is not in the database, you can extract the SIFT features using vl_sift, create the assignment matrix with vl_kdtreequery, and create the VLAD vector for that image using vl_vlad. So, you don't have to find new centroids or create a new kd-tree:

    % Load image and extract SIFT features
    new_image = imread('filename.jpg');
    new_image = single(rgb2gray(new_image));
    [~, new_sift] = vl_sift(new_image);
    
    % Create assignment matrix
    nn = vl_kdtreequery(kdtree, centroids, single(new_sift));
    assignments = zeros(64, numel(nn), 'single');
    assignments(sub2ind(size(assignments)), nn, 1:numel(nn))) = 1;
    
    % Encode using VLAD
    new_vlad = vl_vlad(single(new_sift), centroids, assignments);
    

    [1] Arandjelovic, R., & Zisserman, A. (2013). All About VLAD. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1578–1585. https://doi.org/10.1109/CVPR.2013.207