I am working on integrating affiliate sales into few existing sites. We are using a few merchants who work via different networks (cj, shareasale, linkshare, avantlink).
Now my observation is that all these networks provide data feeds in different formats. But that's not a big problem. My main concern is actually merchants using different titles on same products. I don't want to run into these situations:
a) two listings of the SAME product from N merchants (if titles are just a bit different)
b) one listing of N different products from merchants (if we don't use strict comparison algorithm)
We want to automate everything as much as possible, want to avoid operators scanning listings under question all the time.
How is this problem typically handled?
We have a similar issue with trying to collapse products from multiple merchant feeds. What we do is collapse products based on their brand (or manufacturer) + sku combo.
Our data is pretty messy so we have to do some work to normalize both the brand and the sku so the products collapse nicely. We have a list of brands that we care about and do some work to map brands from the merchant feed into our brand. e.g. If we have an "ACME" brand in our system we might map the following to that brand:
A.C.M.E => ACME
ACME Inc. => ACME
Acme Incorporated => ACME
For skus we usually just strip any non-alphanumeric characters for matching purposes. e.g. all the following would map to the same sku:
abc-123 => abc123
abc.123 => abc123
abc 123 => abc123
ab.c1.23 => abc123
So if we see brand "ACME Inc." and sku "abc-123" in one feed that will collapse with brand "A.C.M.E" and sku "abc 123" from another feed.
As part of the collapsing process we end up with multiple names/images/descriptions/categories/etc... for each collapsed part and need to choose the "best" one to show on the website.
That's a very high level overview of how we handle it.