python-3.x keras deep-learning feature-extraction vgg-net

can we extract VGG16/19 features for classes it was not trained on

I have a query regarding the extraction of VGG16/VGG19 features for my experiments.

The pre-trained VGG16 and VGG19 models have been trained on ImageNet dataset having 1000 classes (say c1,c2, ... c1000) and normally we extract the features from first and second fully connected layers designated ('FC1' and 'FC2'); these 4096 dimensional feature vectors are then used for computer vision tasks.

My question is that can we use these networks to extract features of an image that does not belong to any of the above 1000 classes ? In other words, can we use these networks to extract features of an image with label c1001 ? Remember that c1001 does not belong to the Imagenet classes on which these networks were initially trained on.

In the article available on https://www.pyimagesearch.com/2019/05/20/transfer-learning-with-keras-and-deep-learning/, I am quoting the following -

When performing feature extraction, we treat the pre-trained network as an arbitrary feature extractor, allowing the input image to propagate forward, stopping at pre-specified layer, and taking the outputs of that layer as our features

From the above text, there is no restriction to whether the image must necessarily belong to one of the Imagenet classes.

Kindly spare some time to uncover this mystery.

In the research papers, the authors simply state that they have used features extracted from VGG16/VGG19 network pre-trained on Imagenet dataset without giving any further details.

I am giving a case study for reference:

Animal with Attribute dataset (see https://cvml.ist.ac.at/AwA2/) is a very popular dataset with 50 animal classes for image recognition task. The authors have extracted ILSVRC-pretrained ResNet101 features for the above dataset images. This ResNet 101 network has been pre-trained on 1000 imagenet classes (different imagenet classes are available at https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a#file-imagenet1000_clsidx_to_labels-txt).

Also, the AWA classes are put as follows:

antelope,  grizzly+bear, killer+whale, beaver, dalmatian, persian+cat, horse
german+shepherd, blue+whale, siamese+cat, skunk, mole, tiger, hippopotamus, leopard, moose, spider+monkey, humpback+whale, elephant, gorilla, ox, fox, sheep
seal, chimpanzee, hamster, squirrel, rhinoceros, rabbit, bat, giraffe, wolf, chihuahua, rat, weasel, otter, buffalo, zebra, giant+panda, deer, bobcat, pig, lion, mouse, polar+bear, collie, walrus, raccoon, cow, dolphin

Now, if we compare the classes in the dataset with 1000 Imagenet classes, we find that classes like dolphin, cow, racoon, bobcat, bat, seal, sheep, horse, grizzly bear, giraffe etc are not there in the Imagenet and still the authors went on with extracting ResNet101 features. I believe that the features extracted are generalizable and that is why authors consider these features as meaningful representations for the AWA images.

Your take on this ?

The idea is to get the representations for the images not belonging to ImageNet classes and use them along with their labels in some other classifier.

Solution

For only extracting the features, you can input any image you want in your pretrained VGG/other CNN. However, for the purpose of training, you have to implement other steps as stated below.

The features that are extracted have been determined by means of exclusively training on those 1000 classes and belong to those 1000 classes. You can use your network to predict on images that do not belong to those 1000 classes, but in the paragraphs below I explain why this is not the desired approach.

The key point to outline here is that, the set features that were extracted can be used to detect/determine the presence of other objects within a photo, but not "ready"/"out of the box".

For example, edges and lines are features that are not related exclusively to those 1000 classes, but also to other ones, hence they are useful, general features.

Therefore, you can employ "transfer learning", to train on your own images (dataset), for example c1001, c1002, c1003.

Notice however that you need to train on your own set before you can use the network to predict on your new images(new classes). Transfer learning refers to using the set of already gathered/learned features, which can be suitable to apply on another problem, but you need to train on your "new problem", say c1001, c1002, c1003.