Product Matching w/ Datafeeds

Tireswing

New member
Apr 2, 2009
61
0
0
East Coast, USA
I'm building a price comparison engine which, for now, will only be using nicely formatted XML datafeeds.

The feeds lack any sort of universal product identifier (like a Manufacturer code or something) so I have pre-parsing routines that format the product name to something standard. I've been plowing through this project and I only realized now that I don't know how I should go about matching up the products and indicate they're matched.

Right now, I prep the feeds as arrays, so I figured I could just do the comparison when I have all the data sitting in the arrays, but I'm realizing that the methodology to do so is more complicated than I originally thought.

What is the most efficient way to find the duplicate entries? The array_unique function, as I understand it, won't help me because it only returns a list of unique elements, it doesn't tell me that SKU 341434 @ Vendor1 is the same as SKU u8349 @ Vendor2.

Thanks for your help!
 


Or would it be easier to simply put all the data into a mysql database and then when I pull it up to display determine whether or not there are other entries with the same name and combine them?
 
the name is not what you want to use to match two products since it won't be consistent on different merchants/networks

Merchant A could call the product ASUS G51JX-A1 15.6" Laptop and merchant B ASUS G51JXA1 15.6 Laptop

you might want to use UPC or any other field/field combination you determine in your system, I'm setting up a system that uses datafeed ID + sku for datafeed updates, UPC to uniquely identify a product and find it from other merchants/networks.

I'm using a database.
 
You can use the EAN/UPC to match products. You might want to check out etilize or commerce hub.
 
I'm building a price comparison engine which, for now, will only be using nicely formatted XML datafeeds.

The feeds lack any sort of universal product identifier (like a Manufacturer code or something) so I have pre-parsing routines that format the product name to something standard. I've been plowing through this project and I only realized now that I don't know how I should go about matching up the products and indicate they're matched.

Right now, I prep the feeds as arrays, so I figured I could just do the comparison when I have all the data sitting in the arrays, but I'm realizing that the methodology to do so is more complicated than I originally thought.

What is the most efficient way to find the duplicate entries? The array_unique function, as I understand it, won't help me because it only returns a list of unique elements, it doesn't tell me that SKU 341434 @ Vendor1 is the same as SKU u8349 @ Vendor2.

Thanks for your help!

I had done something similar a while back, and I built a canonical product table and linked all the individual vendor products to it. The pages were built off the canonical table, and I updated the products at the vendor level.

eg
product(id, name, description, upc) # the canonical product
offer(name, description, vendor_sku, product_id) # the offer

To match them I wrote some scripts that looked for the obvious matches (in my case, part numbers), and then wrote a simple web front end so that I could link up the non obvious ones.

FWIW, the XML feed may be formatted well, but you may find that vendors do stupid things like change/reuse SKUs and other stuff that makes it a pain.

Sean