I am working on integrating affiliate sales into few existing sites. We are using a few merchants who work via different networks (cj, shareasale, linkshare, Amazon).
Now my observation is that all these networks provide data feeds in different formats. But that's not a big problem. My main concern is actually merchants using different titles on same products. I don't want to run into these situations:
1) two listings of the SAME product from N merchants (if titles are just a bit different)
2) one listing of N different products from merchants (if we don't use strict comparison algorithm)
We want to automate everything as much as possible, want to avoid operators scanning listings under question all the time.
How is this problem typically handled?
I believe a common solution here is to use some universal identifier (UPC code, ISBN code for books, etc). If you can't do that, it becomes a difficult problem, and you probably won't get it 100% right. This may be a silly (and expensive) idea, but perhaps consider using Amazon Mechanical Turk API to have people do it for you (at least the difficult cases which your algorithm can't get right).