I want to recognize boxes in images. I have a database for these boxes, storing their ocr and images. I do search and get a rough transformation of the face using ocr. It works fine most of the time, but sometimes it returns wrong face and wrong transformation. As I have the source images, I want to take advantage of them to evaluate the search recognition result. I transformed the detected box area to the source image and resized them to same size (so they are from similar perspective, similar size). I used hog, the second last layer of alexnet, vitmae, and my self-trained conv network as the embedding feature. But all of them does not work well. I also tried keypoint features. But it takes much longer than the requirement. Also it fails when distinguish faces with same print but different size.
Is there any other effective way to compare two similar images?
Try Image hashing it creates a compact, fixed-length hash that represents an image's visual features. I have used it on images of BIOS it worked well in my case.