I have a reference image A and 2 target images B and C , I tried to measure the SSIM as follows :
(from a human vision perception A & B are from the same class) and A & C from different class.
result1 = SSIM(A , B) = 4.71027%;
result2 = SSIM(A , C) = 7.95047%;
I used the code from opencv : SSIM CODE
I also tried LBP normalized histogram of the entire image by calculating KL divergence of the two histograms, but the results were worst.
Is there a way to measure the similarity without training?
After @Cris Luengo suggestion, these are the results of 2 LBP versions Circular, and Variance-based. It' seems like the choice of the method (features descriptor) is critical: (result = 0 means identical)
result1 = LPB_CIRCULAR_HIST_KL(A , B) = 0.66;
result2 = LPB_CIRCULAR_HIST_KL(A , C) = 0.64;
result1 = LPB_VAR_HIST_KL(A , B) = 0.49;
result2 = LPB_VAR_HIST_KL(A , C) = 3.74;
As comments suggest, SSIM will not work if the two images are not pixel-alingned. You can find similarity between two unaligned images in a variety of ways. Nowadays one of the most popular is using CLIP. CLIP is what Generative AI like Stable Diffusion is based on.
I suggest you look at this repo which tells you how install CLIP for python and extract features and similarities. The example in there is for image-text similarity but you can extract image-image similarity by doing something like:
import torch
import clip
from PIL import Image
import torch.nn.functional as F
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
image1 = preprocess(Image.open("Image1.png")).unsqueeze(0).to(device)
image2 = preprocess(Image.open("Image2.png")).unsqueeze(0).to(device)
with torch.no_grad():
image1_features = model.encode_image(image1)
image2_features = model.encode_image(image2)
sim = F.cosine_similarity(image1_features, image2_features)
print("Cosine similarity: ", sim)
Note this might be quite slow depending on how many samples you have or what kind of task you want to run (brute force retrieval might not be feasible)