pythoncomputer-visionpytorchfeature-extraction

How to extract feature vector from single image in Pytorch?


I am attempting to understand more about computer vision models, and I'm trying to do some exploring of how they work. In an attempt to understand how to interpret feature vectors more I'm trying to use Pytorch to extract a feature vector. Below is my code that I've pieced together from various places.

import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from PIL import Image



img=Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    
def get_vector(image_name):
    # Load the image with Pillow library
    img = Image.open("Documents/Documents/Driven Data Competitions/Hateful Memes Identification/data/01235.png")
    # Create a PyTorch Variable with the transformed image
    t_img = transforms(img)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)
    # Define a function that will copy the output of a layer
    def copy_data(m, i, o):
        my_embedding.copy_(o.data)
    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    model(t_img)
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding

pic_vector = get_vector(img)

When I do this I get the following error:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 224, 224] instead

I'm sure this is an elementary error, but I can't seem to figure out how to fix this. It was my impression that the "totensor" transformation would make my data 4-d, but it seems it's either not working correctly or I'm misunderstanding it. Appreciate any help or resources I can use to learn more about this!


Solution

  • All the default nn.Modules in pytorch expect an additional batch dimension. If the input to a module is shape (B, ...) then the output will be (B, ...) as well (though the later dimensions may change depending on the layer). This behavior allows efficient inference on batches of B inputs simultaneously. To make your code conform you can just unsqueeze an additional unitary dimension onto the front of t_img tensor before sending it into your model to make it a (1, ...) tensor. You will also need to flatten the output of layer before storing it if you want to copy it into your one-dimensional my_embedding tensor.

    A couple of other things:

    Updated code is then as follows:

    import torch
    import torchvision
    import torchvision.models as models
    from PIL import Image
    
    img = Image.open("Documents/01235.png")
    
    # Load the pretrained model
    model = models.resnet18(pretrained=True)
    
    # Use the model object to select the desired layer
    layer = model._modules.get('avgpool')
    
    # Set model to evaluation mode
    model.eval()
    
    transforms = torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    
    
    def get_vector(image):
        # Create a PyTorch tensor with the transformed image
        t_img = transforms(image)
        # Create a vector of zeros that will hold our feature vector
        # The 'avgpool' layer has an output size of 512
        my_embedding = torch.zeros(512)
    
        # Define a function that will copy the output of a layer
        def copy_data(m, i, o):
            my_embedding.copy_(o.flatten())                 # <-- flatten
    
        # Attach that function to our selected layer
        h = layer.register_forward_hook(copy_data)
        # Run the model on our transformed image
        with torch.no_grad():                               # <-- no_grad context
            model(t_img.unsqueeze(0))                       # <-- unsqueeze
        # Detach our copy function from the layer
        h.remove()
        # Return the feature vector
        return my_embedding
    
    
    pic_vector = get_vector(img)