I need to search a huge image database to find possible duplicate using pHash assuming those image records have the hash code generated using the pHash.
Now I have to compare a new image and I have to create the hash for this using pHash against existing records. But as per my understanding the has comparison is NOT straight forward like
hash1 - has2 < threshold
Looks like I need to pass the both hash codes into a pHash API to do the matching.So I have to retrieve all hash codes from DB in batches and compare one by one using the pHash API.
But this looks not the best approach if I have about 1000 images in queue to be compared against the millions of already exiting images.
I need to know the followings.
Thanks in advance.
I think some part of this question is discussed on the pHash support forum.
You will need to use the mvptree storage mechanism
http://lists.phash.org/htdig.cgi/phash-support-phash.org/2011-May/000122.html and http://lists.phash.org/htdig.cgi/phash-support-phash.org/2010-October/000103.html