Product Matching w/ Datafeeds

Tireswing · May 11, 2010

I'm building a price comparison engine which, for now, will only be using nicely formatted XML datafeeds.

The feeds lack any sort of universal product identifier (like a Manufacturer code or something) so I have pre-parsing routines that format the product name to something standard. I've been plowing through this project and I only realized now that I don't know how I should go about matching up the products and indicate they're matched.

Right now, I prep the feeds as arrays, so I figured I could just do the comparison when I have all the data sitting in the arrays, but I'm realizing that the methodology to do so is more complicated than I originally thought.

What is the most efficient way to find the duplicate entries? The array_unique function, as I understand it, won't help me because it only returns a list of unique elements, it doesn't tell me that SKU 341434 @ Vendor1 is the same as SKU u8349 @ Vendor2.

Thanks for your help!

Tireswing · May 11, 2010

Or would it be easier to simply put all the data into a mysql database and then when I pull it up to display determine whether or not there are other entries with the same name and combine them?

Icecube · May 11, 2010

the name is not what you want to use to match two products since it won't be consistent on different merchants/networks

Merchant A could call the product ASUS G51JX-A1 15.6" Laptop and merchant B ASUS G51JXA1 15.6 Laptop

you might want to use UPC or any other field/field combination you determine in your system, I'm setting up a system that uses datafeed ID + sku for datafeed updates, UPC to uniquely identify a product and find it from other merchants/networks.

I'm using a database.

crackp0t · May 11, 2010

You can use the EAN/UPC to match products. You might want to check out etilize or commerce hub.

SeanW · May 12, 2010

Tireswing said:
I'm building a price comparison engine which, for now, will only be using nicely formatted XML datafeeds.

The feeds lack any sort of universal product identifier (like a Manufacturer code or something) so I have pre-parsing routines that format the product name to something standard. I've been plowing through this project and I only realized now that I don't know how I should go about matching up the products and indicate they're matched.

Right now, I prep the feeds as arrays, so I figured I could just do the comparison when I have all the data sitting in the arrays, but I'm realizing that the methodology to do so is more complicated than I originally thought.

What is the most efficient way to find the duplicate entries? The array_unique function, as I understand it, won't help me because it only returns a list of unique elements, it doesn't tell me that SKU 341434 @ Vendor1 is the same as SKU u8349 @ Vendor2.

Thanks for your help!

I had done something similar a while back, and I built a canonical product table and linked all the individual vendor products to it. The pages were built off the canonical table, and I updated the products at the vendor level.

eg
product(id, name, description, upc) # the canonical product
offer(name, description, vendor_sku, product_id) # the offer

To match them I wrote some scripts that looked for the obvious matches (in my case, part numbers), and then wrote a simple web front end so that I could link up the non obvious ones.

FWIW, the XML feed may be formatted well, but you may find that vendors do stupid things like change/reuse SKUs and other stuff that makes it a pain.

Sean

Search

Search

Product Matching w/ Datafeeds

Tireswing

New member

Tireswing

New member

Icecube

Up 24h/day

crackp0t

010001100100011101010100

SeanW

Janitor